Contributing File
Warning: This documentation is under development.
Overall workflow:
-
From the GitHub repository, open an issue about what you want to contribute.
-
Develop your contribution on the development (
dev
) branch of the git repository. -
Run linting and tests locally. Affirm that everything is passing.
-
Push changes to the development branch.
-
From the GitHub repository, make a pull request.
Set up
Install the project and set up the environment.
IDE
If you're using VS Code, apply the example settings.
mkdir .vscode
cp vscode-settings.example.json .vscode/settings.json
Git & Python
Repository
Clone the repository.
git clone git@github.com:LostMa-ERC/heurist-etl-pipeline.git
Virtual Environment
Set up a virtual Python environment and install the package.
pip install --upgrade pip poetry
poetry install
Development branch
Move to the git repository's development (dev
) branch. If you've never worked on the development branch, create it with checkout -b
instead of checkout
.
git checkout dev
git pull
Development
Before pushing changes to the repository, locally run linting and testing. These checks will be run again and for all covered Python versions when pushed to the remote repository.
Style guide
-
Module names are written in snake case.
- Example:
record_validator.py
- An exception is made for the modules of the
pydantic.BaseXmlModel
models inheurist/models/structural
, i.e.DetailTypes.py
.
- Example:
-
Classes are written in camel case, i.e.
HeuristAPIClient
. -
Functions and class methods have docstrings written in Google's format.
- When describing what the function or method does, the tense is in the imperative, i.e. "Construct a URL from path parameters."
- When a function or method's parameters can be written in a single line and/or don't depend on complex class instances, write unit tests in the docstring with
doctest
.- Preface the shell instructions with
Examples:
. - On the next line, indent by 4 spaces before the doctest string
>>> 1+1
.
- Preface the shell instructions with
-
The location of test modules depends on whether they're end-to-end (
tests/e2e
), integration (tests/integration
), or unit tests (tests/unit
).- From the relevant test directory, the test module is placed in a subdirectory named after the package's corresponding subdirectory.
- For example, a unit test about
heurist/api/client.py
is written in the subdirectorytests/unit/api
. - An exception is made for end-to-end tests, which test CLI commands from the
tests/e2e
directory.
-
Test modules are written in snake case and their name starts with the element being tested, followed by
_test.py
at the end.- For example, a unit test about
heurist/api/client.py
is written intests/unit/api/client_test.py
- For example, a unit test about
-
Complex SQL queries are written in individual SQL files in the
sql/
directory, i.e.sql/query.sql
. Then, the query's parsed text is read in thesql/__init__.py
module and made available as a constant, as follows:
sql_file = Path(__file__).parent.joinpath("query.sql")
with open(sql_file) as f:
QUERY = f.read()
Linting
poetry run flake8 --extend-exclude ./heurist/mock_data/ --max-line-length 88
Testing
poetry run pytest