Update docs / changelog

artic-network · Nov 11, 2024 · 21a1f7f · 21a1f7f
1 parent ba5b35b
commit 21a1f7f
Show file tree

Hide file tree

Showing 12 changed files with 154 additions and 285 deletions.
diff --git a/CHANGELOG b/CHANGELOG
@@ -1,3 +1,16 @@
+v1.4.5:
+    * Fieldbioinformatics now supports rapid barcoded (fragmented) primer trimming and normalisation
+    * Nanopolish has been removed completely due to several compatibility issues
+    * Medaka has also been removed completely due to kicking out long indels in a way that cannot be changed.
+    * Clair3 is now the default variant caller, by default only the r9.4.1 models are available but a artic_get_models command has been added which will fetch the ONT created r10.4.1 models listed in the rerio repository.
+    * The pipeline will also attempt to pick an appropriate model based on the basecall_model_version_id field that is added to read headers by default by ONT sequencers.
+    * Removed longshot entirely, it also kicks out long variants and is now unnecessary due to clair3 being a much better variant caller.
+    * Primer scheme fetcher has been updated to pull from the quick-lab primal hub schemes repository. For schemes not available in this repository you may provide them directly with the arguments --bed and --ref.
+    * Automated docker builds pushing to quay.io for use in nextflow pipelines etc.
+    * Remove some old functionality which is no longer relevant (basecalling, gather, etc)
+    * Re-implement CI as a gh action.
+    * Fix the overlapping variants issue by normalising variants against the pre-consensus using bcftools norm.
+
 v1.1.0-rc1:
     * Support for read groups:
         * Support ‘pool’ read groups taken from BED file, e.g.:

diff --git a/README.md b/README.md
@@ -25,8 +25,6 @@ Features include:
 - variant calling
 - consensus building
 
-There are **2 workflows** baked into this pipeline, one which uses signal data (via [nanopolish](https://github.com/jts/nanopolish)) and one that does not (via [medaka](https://github.com/nanoporetech/medaka)).
-
 <!-- ## Installation
 
 ### Via conda

diff --git a/artic/pipeline.py b/artic/pipeline.py
@@ -66,9 +66,6 @@ def init_pipeline_parser():
     parser_minion = subparsers.add_parser(
         "minion", help="Run the alignment/variant-call/consensus pipeline"
     )
-    # parser_minion.add_argument(
-    #     "scheme", metavar="scheme", help="The name of the scheme"
-    # )
     parser_minion.add_argument(
         "sample", metavar="sample", help="The name of the sample"
     )
@@ -79,7 +76,7 @@ def init_pipeline_parser():
     parser_minion.add_argument(
         "--model-path",
         metavar="model_path",
-        help="Path containing clair3 models, defaults to models packaged with conda installation",
+        help="Path containing clair3 models, defaults to models packaged with conda installation (default: $CONDA_PREFIX/bin/models/)",
         type=str,
     )
     parser_minion.add_argument(

diff --git a/artic/utils.py b/artic/utils.py
@@ -784,7 +784,16 @@ def get_scheme_legacy(scheme_name, scheme_directory, scheme_version="1"):
     raise SystemExit(1)
 
 
-def choose_model(read_file: str):
+def choose_model(read_file: str) -> dict:
+    """
+    Choose the appropriate clair3 model based on the `basecall_model_version_id` field in the read header (if it exists)
+
+    Args:
+        read_file (str): Path to the fastq file
+
+    Returns:
+        dict: The chosen clair3 model as a dictionary
+    """
 
     models_class = clair3_manifest()
     models = models_class.models

diff --git a/docs/commands.md b/docs/commands.md
@@ -5,127 +5,13 @@ authors:
   - Sam Wilkinson
   - Will Rowe
   - Nick Loman
-date: 2024-08-16
+date: 2024-11-11
 ---
 
 # Commands
 
 This page documents the available commands via the `artic` command line interface.
 
-## demultiplex
-
-### Overview
-
-Run demultiplex
-
-### Input
-
-- undemultiplexed FASTA file
-
-### Output
-
-- demultiplexed FASTA file(s)
-
-### Usage example
-
-```bash
-artic demultiplex <fasta>
-```
-
-| Argument name(s)      | Required | Default value | Description                    |
-| :-------------------- | :------- | :------------ | :----------------------------- |
-| fasta                 | Y        | NA            | The undemultiplexed FASTA file |
-| --threads             | N        | 8             | The number of threads          |
-| --prefix              | N        | NA            | Prefix for demultiplexed files |
-| --no-remove-directory | N        | NA            | Don't remove the directory     |
-
----
-
-## export
-
-### Overview
-
-The export command is used to make a redistributable package of data for re-analysis. This includes the FASTQ file, the sequencing summary and the FAST5 file. The selection of reads to be used comes from a BAM file, and only aligned reads are used.
-
-### Input
-
-- a completed minion pipeline run
-
-### Output
-
-- a redistributable package of data
-
-### Usage example
-
-```bash
-artic export <prefix> <bamfile> <sequencing_summary> <fast5_directory> <output_directory>
-```
-
-| Argument name(s)   | Required | Default value | Description                          |
-| :----------------- | :------- | :------------ | :----------------------------------- |
-| prefix             | Y        | NA            | The run prefix                       |
-| bamfile            | Y        | NA            | The BAM file to export reads from    |
-| sequencing_summary | Y        | NA            | Path to Guppy sequencing summary     |
-| fast5_directory    | Y        | NA            | The path to directory of FAST5 files |
-| output_directory   | Y        | NA            | The path to export the data to       |
-
----
-
-## extract
-
-### Overview
-
-Create an empty poredb database
-
-### Input
-
-- na
-
-### Output
-
-- an initialised poredb database
-
-### Usage example
-
-```bash
-artic extract <directory>
-```
-
-| Argument name(s) | Required | Default value                    | Description                |
-| :--------------- | :------- | :------------------------------- | :------------------------- |
-| directory        | Y        | NA                               | The name of the database   |
-| --basecalller    | N        | ONT Albacore Sequencing Software | The name of the basecaller |
-
----
-
-## filter
-
-### Overview
-
-Filter FASTQ files by length
-
-### Input
-
-- unfiltered reads
-
-### Output
-
-- filtered reads
-
-### Usage example
-
-```bash
-artic filter --max-length 500 --min-length 50 <filename>
-```
-
-| Argument name(s) | Required | Default value | Description                            |
-| :--------------- | :------- | :------------ | :------------------------------------- |
-| filename         | Y        | NA            | The reads to filter                    |
-| --max-length     | N        | NA            | Remove reads greater than max-length   |
-| --min-length     | N        | NA            | Remove reads less than than min-length |
-
----
-
 ## guppyplex
 
 ### Overview
@@ -182,14 +68,16 @@ artic minion <scheme> <sample>
 | :------------------- | :------- | :------------- | :------------------------------------------------------------------------------------------- |
 | scheme               | Y        | NA             | The name of the primer scheme                                                                |
 | sample               | Y        | NA             | The name of the sample                                                                       |
-| --clair3             | N        | False          | Use clair3 instead of medaka for variants  (experimental feature from v1.4.0)                |
 | --model       | Y        | NA             | Medaka or Clair3 model to use                                                                       |
 | --normalise          | N        | 100            | Normalise down to moderate coverage to save runtime                                          |
 | --threads            | N        | 8              | Number of threads                                                                            |
 | --scheme-directory   | N        | /artic/schemes | Default scheme directory                                                                     |
-| --max-haplotypes     | N        | 1000000        | Max-haplotypes value for nanopolish                                                          |
+| --scheme-name     | N        |                | Name of scheme to fetch from the primerschemes repository     |
+| --scheme-length     | N        |                | Length of scheme to fetch from the primerschemes repository     |
+| --scheme-version     | N        |                | Version of the scheme to fetch from the primerschemes repository     |
+| --bed                | N        |                | Bed file path            |
+| --ref                | N        |                | Reference fasta path     |
 | --read-file          | N        | NA             | Use alternative FASTA/FASTQ file to <sample>.fasta                                           |
-| --no-longshot        | N        | False          | Use medaka variant instead of longshot (experimental feautre from v1.2.0)                    |
 | --min-mapq           | Y        | 20             | Remove reads which map to the reference with a lower mapping quality than this               |
 | --no-indels          | N        | False          | Ignore insertions and deletions during variant calling, maintains the co-ordinates of the ref|
 | --no-frameshifts     | N        | False          | Do not allow frameshift variants (indels of lengths which are non divisible be 3 ) to be added to the consensus |

diff --git a/docs/faq.md b/docs/faq.md
@@ -4,18 +4,19 @@ summary: The FAQ.
 authors:
   - Will Rowe
   - Nick Loman
+  - Sam Wilkinson
 date: 2020-03-30
 ---
 
 # FAQ
 
-## Where can I find the SOP for SARS-CoV-2
+## How do I process MPXV data?
 
-The standard operating proceedure for the ARTIC Network SARS-SoV-2 bioinformatics can be found [here](https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html).
+A set of resources for processing MPXV sequencing data may be found [here](https://artic.network/mpxv), this includes running this pipeline on the command line and the artic MPXV nextflow pipelines via epi2me.
 
-## Should I use the medaka or clair3 workflow
+## Where can I find the SOP for SARS-CoV-2?
 
-We currently recommend the medaka workflow as we have spent more time validating and supporting this workflow. That being said, both tend to give consistent results with our test datasets so the choice is yours.
+The standard operating proceedure for the ARTIC Network SARS-SoV-2 bioinformatics can be found [here](https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html).
 
 ## Lab-on-an-SSD
 

diff --git a/docs/installation.md b/docs/installation.md
@@ -4,12 +4,13 @@ summary: The installation guide.
 authors:
   - Will Rowe
   - Nick Loman
+  - Sam Wilkinson
 date: 2020-03-30
 ---
 
 # Installation
 
-As of [release 1.4.0](https://github.com/artic-network/fieldbioinformatics/releases/tag/1.4.0), conda installation of fieldbioinformatics will become difficult due to the mutually exclusive requirements of medaka and clair3, for this reason we recommend either utilising the docker image [available here](https://quay.io/repository/artic/fieldbioinformatics) or to build the package from source after installing the dependencies via Conda.
+As of [release 1.4.0](https://github.com/artic-network/fieldbioinformatics/releases/tag/1.4.0), we provide a docker image [available here](https://quay.io/repository/artic/fieldbioinformatics) and a conda package. You may also wish to install the package from source after installing the dependencies via Conda yourself.
 
 ## Via conda
 
@@ -44,11 +45,10 @@ First check the pipeline can be called:
 artic -v
 ```
 
-To check that you have all the required dependencies, you can try the pipeline tests with both workflows:
+To check that you have all the required dependencies, you can try the pipeline tests like so:
 
 ```
 ./test-runner.sh clair3
-./test-runner.sh medaka
 ```
 
 For further tests, such as the variant validation tests, see [here](http://artic.readthedocs.io/en/latest/tests?badge=latest).