We love contributors, so first and foremost, thank you! We're actively working on our contributing guidelines, so this document is subject to change. First things first: we adhere to the Contributor Covenant Code of Conduct, and expect all of our contributors to adhere to it as well.
Snorkel uses tox to manage development environments.
To get started, install tox,
clone Snorkel, then use tox
to create a development environment:
git clone https://github.com/snorkel-team/snorkel
pip3 install -U tox
cd snorkel
tox --devenv .env
Running tox --devenv .env
will create a virtual environment with Snorkel
and all of its dependencies installed in the directory .env
.
This can be used in a number of ways, e.g. with source .env/bin/activate
or for linting in VSCode.
For example, you can simply activate this environment and start using Snorkel:
source .env/bin/activate
python3 -c "import snorkel.labeling; print(dir(snorkel.labeling))"
There are a number of useful tox commands defined:
tox -e py311 # Run unit tests pytest in Python 3.11
tox -e coverage # Compute unit test coverage
tox -e spark # Run Spark-based tests (marked with @pytest.mark.spark)
tox -e complex # Run more complex, integration tests (marked with @pytest.mark.complex)
tox -e doctest # Run doctest on modules
tox -e check # Check style/linting with black, isort, and flake8
tox -e type # Run static type checking with mypy
tox -e fix # Fix style issues with black and isort
tox -e doc # Build documentation with Sphinx
tox # Run unit tests, doctests, style checks, linting, and type checking
Make sure to run tox
before committing.
CI won't pass without tox
succeeding.
As noted, we use a few additional tools that help to ensure that any commits or pull requests you submit conform with our established standards. We use the following packages:
- isort: import standardization
- black: automatic code formatting
- flake8: PEP8 linting
- mypy: static type checking
- pydocstyle: docstring compliance
- doctest-plus: check docstring code examples
The Snorkel maintainers are big fans of VSCode's Python tooling.
Here's a settings.json
that takes advantage of the packages above (except isort) with in-line linting:
{
"python.jediEnabled": true,
"python.formatting.provider": "black",
"python.linting.flake8Enabled": true,
"python.linting.mypyEnabled": true,
"python.linting.pydocstyleEnabled": true,
"python.linting.pylintEnabled": false,
}
Snorkel ♥ documentation.
We expect all PRs to add or update API documentation for any affected pieces of code.
We use NumPy style docstrings, and enforce style compliance with pydocstyle as indicated above.
Docstrings can be cumbersome to write, so we encourage people to use tooling to speed up the process.
For VSCode, we like autoDocstring.
Just install the extension and add the following configuration to the settings.json
example above.
Note that we use PEP 484 type hints, so parameter types should be removed from the docstring (although note that return types should still be included).
{
"autoDocstring.docstringFormat": "numpy",
"autoDocstring.guessTypes": false
}
There are some standards we follow that our tooling doesn't automatically check/initialize:
- Examples, examples, examples.
We love examples in docstrings; it's often the best form of documentation.
The
Example
orExamples
section should come afterParameters
but beforeAttributes
. Runningtox -e doctest
will test your docstring examples. - Make sure to add
Attributes
sections to docstrings to document public attributes of classes. TheAttributes
section should be the last part of the docstring. - No need to document private methods or attributes.
Any test that runs longer than half a second should be marked with the
@pytest.mark.complex
decorator.
Typically, these will be integration tests or tests that verify complex
properties like model convergence.
We exclude long-running tests from the default tox
and Circle CI builds
on non-main and non-release branches to keep things moving fast.
If you're touching areas of the code that could break a long-running test,
you should include the results of tox -e complex
in the PR's test plan.
To see the durations of the 10 longest-running tests, run
tox -e py3 -- -m 'not complex and not spark' --durations 10
.
PySpark tests are invoked separately from the rest since they require
installing Java and the large PySpark package.
They are executed on Circle CI, but not by default for a local tox
command.
If you're making changes to Spark-based operators, make sure you have
Java 8 installed locally and then run tox -e spark
.
If you add a test that imports PySpark mark it with the
@pytest.mark.spark
decorator.
Add the @pytest.mark.complex
decorator as well if it runs a Spark
action (e.g. .collect()
).
When submitting a PR, make sure to use the preformatted template.
Except in special cases, all PRs should be against main
.
Avoid using "staging branches" as much as possible.
If you want to add complicated features, please
stack your PRs
to ensure an effective review process.
It's unlikely that we'll approve any
single PR over 500 lines.
Direct commits to main are blocked, and PRs require an approving review to merge into main. By convention, the Snorkel maintainers will review PRs when:
- An initial review has been requested
- A maintainer is tagged in the PR comments and asked to complete a review
We ask that you make sure initial CI checks are passing before requesting a review.
The PR author owns the test plan and has final say on correctness. Therefore, it is up to the PR author to give the final okay on merging (or merge their PR if they have write access).