Skip to content

Commit

Permalink
Update docs / changelog
Browse files Browse the repository at this point in the history
  • Loading branch information
BioWilko committed Nov 11, 2024
1 parent ba5b35b commit 21a1f7f
Show file tree
Hide file tree
Showing 12 changed files with 154 additions and 285 deletions.
13 changes: 13 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
v1.4.5:
* Fieldbioinformatics now supports rapid barcoded (fragmented) primer trimming and normalisation
* Nanopolish has been removed completely due to several compatibility issues
* Medaka has also been removed completely due to kicking out long indels in a way that cannot be changed.
* Clair3 is now the default variant caller, by default only the r9.4.1 models are available but a artic_get_models command has been added which will fetch the ONT created r10.4.1 models listed in the rerio repository.
* The pipeline will also attempt to pick an appropriate model based on the basecall_model_version_id field that is added to read headers by default by ONT sequencers.
* Removed longshot entirely, it also kicks out long variants and is now unnecessary due to clair3 being a much better variant caller.
* Primer scheme fetcher has been updated to pull from the quick-lab primal hub schemes repository. For schemes not available in this repository you may provide them directly with the arguments --bed and --ref.
* Automated docker builds pushing to quay.io for use in nextflow pipelines etc.
* Remove some old functionality which is no longer relevant (basecalling, gather, etc)
* Re-implement CI as a gh action.
* Fix the overlapping variants issue by normalising variants against the pre-consensus using bcftools norm.

v1.1.0-rc1:
* Support for read groups:
* Support ‘pool’ read groups taken from BED file, e.g.:
Expand Down
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,6 @@ Features include:
- variant calling
- consensus building

There are **2 workflows** baked into this pipeline, one which uses signal data (via [nanopolish](https://github.com/jts/nanopolish)) and one that does not (via [medaka](https://github.com/nanoporetech/medaka)).

<!-- ## Installation
### Via conda
Expand Down
5 changes: 1 addition & 4 deletions artic/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,6 @@ def init_pipeline_parser():
parser_minion = subparsers.add_parser(
"minion", help="Run the alignment/variant-call/consensus pipeline"
)
# parser_minion.add_argument(
# "scheme", metavar="scheme", help="The name of the scheme"
# )
parser_minion.add_argument(
"sample", metavar="sample", help="The name of the sample"
)
Expand All @@ -79,7 +76,7 @@ def init_pipeline_parser():
parser_minion.add_argument(
"--model-path",
metavar="model_path",
help="Path containing clair3 models, defaults to models packaged with conda installation",
help="Path containing clair3 models, defaults to models packaged with conda installation (default: $CONDA_PREFIX/bin/models/)",
type=str,
)
parser_minion.add_argument(
Expand Down
11 changes: 10 additions & 1 deletion artic/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -784,7 +784,16 @@ def get_scheme_legacy(scheme_name, scheme_directory, scheme_version="1"):
raise SystemExit(1)


def choose_model(read_file: str):
def choose_model(read_file: str) -> dict:
"""
Choose the appropriate clair3 model based on the `basecall_model_version_id` field in the read header (if it exists)
Args:
read_file (str): Path to the fastq file
Returns:
dict: The chosen clair3 model as a dictionary
"""

models_class = clair3_manifest()
models = models_class.models
Expand Down
124 changes: 6 additions & 118 deletions docs/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,127 +5,13 @@ authors:
- Sam Wilkinson
- Will Rowe
- Nick Loman
date: 2024-08-16
date: 2024-11-11
---

# Commands

This page documents the available commands via the `artic` command line interface.

## demultiplex

### Overview

Run demultiplex

### Input

- undemultiplexed FASTA file

### Output

- demultiplexed FASTA file(s)

### Usage example

```bash
artic demultiplex <fasta>
```

| Argument name(s) | Required | Default value | Description |
| :-------------------- | :------- | :------------ | :----------------------------- |
| fasta | Y | NA | The undemultiplexed FASTA file |
| --threads | N | 8 | The number of threads |
| --prefix | N | NA | Prefix for demultiplexed files |
| --no-remove-directory | N | NA | Don't remove the directory |

---

## export

### Overview

The export command is used to make a redistributable package of data for re-analysis. This includes the FASTQ file, the sequencing summary and the FAST5 file. The selection of reads to be used comes from a BAM file, and only aligned reads are used.

### Input

- a completed minion pipeline run

### Output

- a redistributable package of data

### Usage example

```bash
artic export <prefix> <bamfile> <sequencing_summary> <fast5_directory> <output_directory>
```

| Argument name(s) | Required | Default value | Description |
| :----------------- | :------- | :------------ | :----------------------------------- |
| prefix | Y | NA | The run prefix |
| bamfile | Y | NA | The BAM file to export reads from |
| sequencing_summary | Y | NA | Path to Guppy sequencing summary |
| fast5_directory | Y | NA | The path to directory of FAST5 files |
| output_directory | Y | NA | The path to export the data to |

---

## extract

### Overview

Create an empty poredb database

### Input

- na

### Output

- an initialised poredb database

### Usage example

```bash
artic extract <directory>
```

| Argument name(s) | Required | Default value | Description |
| :--------------- | :------- | :------------------------------- | :------------------------- |
| directory | Y | NA | The name of the database |
| --basecalller | N | ONT Albacore Sequencing Software | The name of the basecaller |

---

## filter

### Overview

Filter FASTQ files by length

### Input

- unfiltered reads

### Output

- filtered reads

### Usage example

```bash
artic filter --max-length 500 --min-length 50 <filename>
```

| Argument name(s) | Required | Default value | Description |
| :--------------- | :------- | :------------ | :------------------------------------- |
| filename | Y | NA | The reads to filter |
| --max-length | N | NA | Remove reads greater than max-length |
| --min-length | N | NA | Remove reads less than than min-length |

---

## guppyplex

### Overview
Expand Down Expand Up @@ -182,14 +68,16 @@ artic minion <scheme> <sample>
| :------------------- | :------- | :------------- | :------------------------------------------------------------------------------------------- |
| scheme | Y | NA | The name of the primer scheme |
| sample | Y | NA | The name of the sample |
| --clair3 | N | False | Use clair3 instead of medaka for variants (experimental feature from v1.4.0) |
| --model | Y | NA | Medaka or Clair3 model to use |
| --normalise | N | 100 | Normalise down to moderate coverage to save runtime |
| --threads | N | 8 | Number of threads |
| --scheme-directory | N | /artic/schemes | Default scheme directory |
| --max-haplotypes | N | 1000000 | Max-haplotypes value for nanopolish |
| --scheme-name | N | | Name of scheme to fetch from the primerschemes repository |
| --scheme-length | N | | Length of scheme to fetch from the primerschemes repository |
| --scheme-version | N | | Version of the scheme to fetch from the primerschemes repository |
| --bed | N | | Bed file path |
| --ref | N | | Reference fasta path |
| --read-file | N | NA | Use alternative FASTA/FASTQ file to <sample>.fasta |
| --no-longshot | N | False | Use medaka variant instead of longshot (experimental feautre from v1.2.0) |
| --min-mapq | Y | 20 | Remove reads which map to the reference with a lower mapping quality than this |
| --no-indels | N | False | Ignore insertions and deletions during variant calling, maintains the co-ordinates of the ref|
| --no-frameshifts | N | False | Do not allow frameshift variants (indels of lengths which are non divisible be 3 ) to be added to the consensus |
Expand Down
9 changes: 5 additions & 4 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,19 @@ summary: The FAQ.
authors:
- Will Rowe
- Nick Loman
- Sam Wilkinson
date: 2020-03-30
---

# FAQ

## Where can I find the SOP for SARS-CoV-2
## How do I process MPXV data?

The standard operating proceedure for the ARTIC Network SARS-SoV-2 bioinformatics can be found [here](https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html).
A set of resources for processing MPXV sequencing data may be found [here](https://artic.network/mpxv), this includes running this pipeline on the command line and the artic MPXV nextflow pipelines via epi2me.

## Should I use the medaka or clair3 workflow
## Where can I find the SOP for SARS-CoV-2?

We currently recommend the medaka workflow as we have spent more time validating and supporting this workflow. That being said, both tend to give consistent results with our test datasets so the choice is yours.
The standard operating proceedure for the ARTIC Network SARS-SoV-2 bioinformatics can be found [here](https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html).

## Lab-on-an-SSD

Expand Down
6 changes: 3 additions & 3 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,13 @@ summary: The installation guide.
authors:
- Will Rowe
- Nick Loman
- Sam Wilkinson
date: 2020-03-30
---

# Installation

As of [release 1.4.0](https://github.com/artic-network/fieldbioinformatics/releases/tag/1.4.0), conda installation of fieldbioinformatics will become difficult due to the mutually exclusive requirements of medaka and clair3, for this reason we recommend either utilising the docker image [available here](https://quay.io/repository/artic/fieldbioinformatics) or to build the package from source after installing the dependencies via Conda.
As of [release 1.4.0](https://github.com/artic-network/fieldbioinformatics/releases/tag/1.4.0), we provide a docker image [available here](https://quay.io/repository/artic/fieldbioinformatics) and a conda package. You may also wish to install the package from source after installing the dependencies via Conda yourself.

## Via conda

Expand Down Expand Up @@ -44,11 +45,10 @@ First check the pipeline can be called:
artic -v
```

To check that you have all the required dependencies, you can try the pipeline tests with both workflows:
To check that you have all the required dependencies, you can try the pipeline tests like so:

```
./test-runner.sh clair3
./test-runner.sh medaka
```

For further tests, such as the variant validation tests, see [here](http://artic.readthedocs.io/en/latest/tests?badge=latest).
Loading

0 comments on commit 21a1f7f

Please sign in to comment.