marp | theme |
---|---|
true |
default |
A two-day workshop for bioinformaticians and molecular biologists with focus on the TSO500 pipeline in InPreD
https://inpred.github.io/24-03_bioinfo_ws/
- Setup
- Development & Collaboration
- Nextflow
- tso500_nxf_workflow
- Python
- go to https://github.com/ and click on
Sign up
- enter your email
- set a password
- choose a username
- choose email preferences
- solve the puzzle
- create your account
- find the activation code in the email you received
- select the desired options
- choose the free plan
- distributed version control system
- tracks history of changes commited by different contributors
- every developer has full copy of project and its history
git config --global user.name <your name>
git config --global user.email <your email>
git init
: initialises new git repository
git clone <repository url>
: creates local copy of remote repository
git add <file/s>
: stage new or changed files (anything that should be committed to the repository)
git commit -m "feat: my new feature"
: commit changes to the repository
<type>[optional scope]: <description>
feat
: new featurefix
: patching bugrefactor
: code change that neither is neither feat nor fixbuild
: build system related changesperf
: improving performance
<type>[optional scope]: <description>
chore
: code unrelated changes, e.g. dependenciesstyle
: code change that does not change meaningtest
: changes to testsdocs
: adding/updating documentationci
: continuous integration, e.g. github actions
git status
: overview over untracked, modified and staged changes
git branch
: show local branches
git merge
: merge branches
git pull
: load changes from remote counterpart
git push
: upload changes to remote counterpart
- start with two branches to record project history:
main
anddevelop
- each new feature resides in its own branch (feature branch)
- feature branch is generally created off latest
develop
commit - upon feature completion, feature branch is merged into
develop
- whenever you are ready to release, merge
develop
intomain
and tag it
- continuous integration (CI) and continuous deployment (CD)
- building, testing and deploying directly from GitHub
- set up by adding yaml instructions to
.github/workflows
name: GitHub Actions Demo
on: [push]
jobs:
Explore-GitHub-Actions:
runs-on: ubuntu-latest
steps:
- run: echo "Hello world!"
name: Docker Build
on:
push:
branches:
- main
- develop
tags:
- '*.*.*'
jobs:
test:
name: Run unit tests
runs-on: ubuntu-latest
steps:
-
name: Check out the repo
uses: actions/checkout@v4
-
name: Unit testing
uses: fylein/python-pytest-github-action@v2
with:
args: pip3 install -r requirements.txt && pytest
...
...
build:
name: Build Image
runs-on: ubuntu-latest
needs: test
steps:
-
name: Check out the repo
uses: actions/checkout@v4
-
name: Lint Dockerfile
uses: hadolint/[email protected]
-
name: Docker Meta
id: meta
uses: docker/metadata-action@v5
with:
images: |
inpred/local_app_prepper
tags: |
latest
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=semver,pattern={{major}}
-
name: Login to Dockerhub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
-
name: Build and push image to Docker Hub
uses: docker/build-push-action@v5
with:
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- go to issues and create a
New issue
- give the issue a descriptive title and a description and
Submit new issue
- if you decide to work on the issue (own repository),
Create a branch
via the issue
Change branch source
todevelop
andCreate branch
- load the new branch to your local repository, check it out and start working
- push your changes back to the remote
$ git pull
$ git checkout 4-new-fancy-feature
$ git add README.md
$ git commit -m "docs: updating docs"
$ git push
- for repositories you don't have access to, create a fork
- once you have a fork,
git clone
your forked repository - create a new branch and work on that
git push
your changes back to the forked remote
- when you are done, go to pull requests and create a
New pull request
- choose
develop
asbase
and your new feature branch (same repo or forked) forcompare
assign yourself
, add at least one reviewer (cog icon), provide some context andCreate pull request
- if you still want to work on the pull request, you can
Convert to draft
to let the reviewers know that it is not done yet - otherwise you can just wait for them to review your changes
- as a reviewer, make your you check your email notifications to see if there is pull requests waiting for you
- open the pull request and start the review in the
Files changed
tab
- you can leave comments and suggestions in the code by hovering over the line with the changes and clicking on
+
- you can type your comment
- or you leave a suggestion, ideally you click
Start a review
to initialise the reviewing process
- when you are done with reviewing,
Finish your review
- again, leave a comment if you like, and choose if you just want to
Comment
,Approve
orRequest changes
- you can add a general comment to the pull request under
Conversation
- after the reviewer left their comments and suggestions, you can address them one by one by replying or applying the suggested changes
- whenever a certain comment/suggestion is handled (discussion comes to conclusion, suggestion was applied), you can resolve it
- as soon as the reviewers gave you an approval, you can finally
Merge pull request
- go to https://github.com/InPreD/24-03_bioinfo_ws/
- create fork to your own account
- open an issue "test pull request" or similar and create a branch
- go to the branch and add a markdown file with your first name and favorite emoji to the
participants
folder, ideally the file is named<your first name>.md
- open a pull request in the original repository and add someone else in the group to review your pull request
- review someone else's pull request, give feedback and approve if correct
- releases should be from
main
branch - good practice is to open a pull request for
develop
intomain
when you are done with the desired features
- whenever you are ready for a new release,
create a new release
- add a title and a description for your release and
Choose a tag
- ideally, you choose a tag according to semantic versioning
- version tag should be MAJOR.MINOR.PATCH
- you increment one of the three depending on the change
- MAJOR: version when you make incompatible API changes
- MINOR: version when you add functionality in a backward compatible manner
- PATCH: version when you make backward compatible bug fixes
- when you are satisfied with your release,
Publish release
- let's discuss
- workflow manager that enables scalable and reproducible scientific workflows using software containers
- an extension of groovy which is object-oriented programming language for the Java platform
- can be used with an array of executors, such as SLURM, k8s, AWS, Azure, Google Cloud and many more
- nf-core: project/community that develops framework for nextflow including guidelines, tools, modules, subworkflows, pipelines and test data
- POSIX compatible system (e.g. Linux, Os X)
- Bash
- Java ≥ 11 / ≤ 21
- Docker/Singularity
$ curl -s https://get.nextflow.io | bash
$ chmod +x nextflow
or
$ wget -O nextflow https://github.com/nextflow-io/nextflow/releases/download/v23.10.1/nextflow-23.10.1-all
or via browser at https://github.com/nextflow-io/nextflow/releases
workflow_repo
├── LICENSE
├── README.md
├── assets
│ ├── mock.genome.fasta
│ ├── samplesheet.csv
│ └── schema_input.json
├── bin
│ └── script.py
├── conf
│ ├── base.config
│ ├── modules.config
│ └── test_stub.config
├── lib
│ ├── NfcoreSchema.groovy
│ ├── NfcoreTemplate.groovy
│ ├── WorkflowMain.groovy
│ └── nfcore_external_java_deps.jar
├── main.nf
├── modules
│ ├── local
│ │ ├── module_1.nf
│ │ └── module_2.nf
│ └── nf-core
│ ├── module_1
│ │ └── arg_1
│ │ ├── main.nf
│ │ └── meta.yml
│ └── custom
│ └── dumpsoftwareversions
│ ├── main.nf
│ ├── meta.yml
│ └── templates
│ └── dumpsoftwareversions.py
├── modules.json
├── nextflow.config
├── nextflow_schema.json
└── workflows
└── main.nf
- modified nf-core template (removed unnecessary functionality, config and metadata files)
- added devcontainer to have controlled environment (dind and sind available)
- stubbing data available
- containing three modules so far (
localapp_prepper
,LocalApp
,dumpsoftwareversions
) - using nf-validation plugin
samplesheet_generator
tsoppi
(requires some restructuring)PRONTO
- include configuration files for each node
- Documentation
- consistency/standard
- keep main script short and sweet - functionality in modules
#!/usr/local/bin/python
from my_module import main
if __name__ == "__main__":
main()
- module folder should contain
__init__.py
- keep functions short and try to refactor big functions
- leave descriptive comments in code
- use libraries to make your life easier
pandas
: csv/tsv filesclick
orargparse
: define cli input flags
- introduce proper exception handling
- logging with log levels
pytest
for testing- include unit tests for functions, preferable table-driven
def addition(x, y):
return x+y
import pytest
@pytest.mark.parametrize("x, y, z", [(1, 1, 2), (1, -1, 0)])
def test_eval(x, y, z):
assert addition(x, y) == z
$ pytest
- include test data for unit testing if necessary
- create container image from project, preferably docker
- include all necessary dependencies in
requirements.txt
(locked versions) - add GitHub actions for testing, linting, building, etc.
- preferable include a devcontainer definition
README.md
and otherdocs
/repo
|-- .devcontainer
| `-- devcontainer.json
|-- .github
| `-- workflows
| `-- main.yml
|-- .gitignore
|-- Dockerfile
|-- docs
|-- LICENSE
|-- README.md
|-- my_tool.py
|-- my_module
| |-- __init__.py
| |-- my_module.py
| `-- tests
| |-- __init__.py
| `-- my_module_test.py
|-- requirements.txt
`-- test