Merge branch 'template_0924' into 'dev'

Template update See merge request epi2melabs/workflows/wf-cas9!56
epi2me-labs · Sep 13, 2024 · 10243fd · 10243fd
2 parents 7811221 + 2e875e3
commit 10243fd
Show file tree

Hide file tree

Showing 17 changed files with 1,110 additions and 196 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -45,13 +45,19 @@ body:
       label: Workflow Execution
       description: Where are you running the workflow?
       options:
-        - EPI2ME Desktop application
-        - Command line
-        - EPI2ME cloud agent
+        - EPI2ME Desktop (Local)
+        - EPI2ME Desktop (Cloud)
+        - Command line (Local)
+        - Command line (Cluster)
         - Other (please describe)
     validations:
       required: true
-
+  - type: input
+    id: other-workflow-execution
+    attributes:
+      label: Other workflow execution
+      description: If "Other", please describe
+      placeholder: Tell us where / how you are running the workflow.
 
   - type: markdown
     attributes:

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -8,7 +8,7 @@ repos:
         always_run: true
         pass_filenames: false
         additional_dependencies:
-          - epi2melabs>=0.0.52
+          - epi2melabs==0.0.57
       - id: build_models
         name: build_models
         entry: datamodel-codegen --strict-nullable --base-class workflow_glue.results_schema_helpers.BaseModel --use-schema-description --disable-timestamp --input results_schema.yml --input-file-type openapi --output bin/workflow_glue/results_schema.py

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,10 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [v1.1.2]
+### Changed
+- Updated Ezcharts to v0.11.2.
+
 ## [v1.1.1]
 ### Changed
 - The name of the column with run IDs from `run_ids` to `run_id`.

diff --git a/README.md b/README.md
@@ -43,37 +43,64 @@ ARM processor support: True
 
 ## Install and run
 
-<!---Nextflow text remains the same across workflows, update example cmd and demo data sections.--->
-These are instructions to install and run the workflow on command line. You can also access the workflow via the [EPI2ME application](https://labs.epi2me.io/downloads/).
 
-The workflow uses [Nextflow](https://www.nextflow.io/) to manage compute and software resources, therefore nextflow will need to be installed before attempting to run the workflow.
-
-The workflow can currently be run using either [Docker](https://www.docker.com/products/docker-desktop) or
-[Singularity](https://docs.sylabs.io/guides/3.0/user-guide/index.html) to provide isolation of
-the required software. Both methods are automated out-of-the-box provided
-either docker or singularity is installed. This is controlled by the [`-profile`](https://www.nextflow.io/docs/latest/config.html#config-profiles) parameter as exemplified below.
-
-It is not required to clone or download the git repository in order to run the workflow.
-More information on running EPI2ME workflows can be found on our [website](https://labs.epi2me.io/wfindex).
-
-The following command can be used to obtain the workflow. This will pull the repository in to the assets folder of nextflow and provide a list of all parameters available for the workflow as well as an example command:
+These are instructions to install and run the workflow on command line.
+You can also access the workflow via the
+[EPI2ME Desktop application](https://labs.epi2me.io/downloads/).
+
+The workflow uses [Nextflow](https://www.nextflow.io/) to manage
+compute and software resources,
+therefore Nextflow will need to be
+installed before attempting to run the workflow.
+
+The workflow can currently be run using either
+[Docker](https://www.docker.com/products/docker-desktop)
+or [Singularity](https://docs.sylabs.io/guides/3.0/user-guide/index.html)
+to provide isolation of the required software.
+Both methods are automated out-of-the-box provided
+either Docker or Singularity is installed.
+This is controlled by the
+[`-profile`](https://www.nextflow.io/docs/latest/config.html#config-profiles)
+parameter as exemplified below.
+
+It is not required to clone or download the git repository
+in order to run the workflow.
+More information on running EPI2ME workflows can
+be found on our [website](https://labs.epi2me.io/wfindex).
+
+The following command can be used to obtain the workflow.
+This will pull the repository in to the assets folder of
+Nextflow and provide a list of all parameters
+available for the workflow as well as an example command:
 
 ```
-nextflow run epi2me-labs/wf-cas9 –help
+nextflow run epi2me-labs/wf-cas9 --help
 ```
-A demo dataset is provided for testing of the workflow. It can be downloaded using:
+To update a workflow to the latest version on the command line use
+the following command:
 ```
-wget https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-cas9/wf-cas9-demo.tar.gz \
-  && tar -xvf wf-cas9-demo.tar.gz
+nextflow pull epi2me-labs/wf-cas9
 ```
-The workflow can be run with the demo data using:
+
+A demo dataset is provided for testing of the workflow.
+It can be downloaded and unpacked using the following commands:
+```
+wget https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-cas9/wf-cas9-demo.tar.gz
+tar -xzvf wf-cas9-demo.tar.gz
+```
+The workflow can then be run with the downloaded demo data using:
 ```
 nextflow run epi2me-labs/wf-cas9 \
-  --fastq wf-cas9-demo/fastq/ \
-  --reference_genome wf-cas9-demo/grch38/grch38_chr19_22.fa.gz \
-  --targets wf-cas9-demo/targets.bed
+	--fastq 'wf-cas9-demo/fastq/sample_1' \
+	--full_report \
+	--reference_genome 'wf-cas9-demo/grch38/grch38_chr19_22.fa.gz' \
+	--targets 'wf-cas9-demo/targets.bed' \
+	-profile standard
 ```
-For further information about running a workflow on the cmd line see https://labs.epi2me.io/wfquickstart/
+
+For further information about running a workflow on
+the command line see https://labs.epi2me.io/wfquickstart/
+
 
 
 
@@ -145,13 +172,6 @@ input_reads.fastq   ─── input_directory  ─── input_directory
 | threads | integer | Number of CPU threads to use per workflow task. | The total CPU resource used by the workflow is constrained by the executor configuration. | 8 |
 
 
-### Miscellaneous Options
-
-| Nextflow parameter name  | Type | Description | Help | Default |
-|--------------------------|------|-------------|------|---------|
-| disable_ping | boolean | Enable to prevent sending a workflow ping. |  | False |
-
-
 
 
 

diff --git a/bin/workflow_glue/__init__.py b/bin/workflow_glue/__init__.py
@@ -3,6 +3,7 @@
 import glob
 import importlib
 import os
+import sys
 
 from .util import _log_level, get_main_logger  # noqa: ABS101
 
@@ -11,15 +12,17 @@
 _package_name = "workflow_glue"
 
 
-def get_components():
+def get_components(allowed_components=None):
     """Find a list of workflow command scripts."""
     logger = get_main_logger(_package_name)
     path = os.path.dirname(os.path.abspath(__file__))
-    components = list()
+    components = dict()
     for fname in glob.glob(os.path.join(path, "*.py")):
         name = os.path.splitext(os.path.basename(fname))[0]
         if name in ("__init__", "util"):
             continue
+        if allowed_components is not None and name not in allowed_components:
+            continue
 
         # leniently attempt to import module
         try:
@@ -34,14 +37,16 @@ def get_components():
         try:
             req = "main", "argparser"
             if all(callable(getattr(mod, x)) for x in req):
-                components.append(name)
+                components[name] = mod
         except Exception:
             pass
     return components
 
 
 def cli():
     """Run workflow entry points."""
+    logger = get_main_logger(_package_name)
+    logger.info("Bootstrapping CLI.")
     parser = argparse.ArgumentParser(
         'wf-glue',
         parents=[_log_level()],
@@ -56,16 +61,21 @@ def cli():
         help='additional help', dest='command')
     subparsers.required = True
 
-    # all component demos, plus some others
-    components = [
-        f'{_package_name}.{comp}' for comp in get_components()]
-    for module in components:
-        mod = importlib.import_module(module)
+    # importing everything can take time, try to shortcut
+    if len(sys.argv) > 1:
+        components = get_components(allowed_components=[sys.argv[1]])
+        if not sys.argv[1] in components:
+            logger.warn("Importing all modules, this may take some time.")
+            components = get_components()
+    else:
+        components = get_components()
+
+    # add all module parsers to main CLI
+    for name, module in components.items():
         p = subparsers.add_parser(
-            module.split(".")[-1], parents=[mod.argparser()])
-        p.set_defaults(func=mod.main)
+            name.split(".")[-1], parents=[module.argparser()])
+        p.set_defaults(func=module.main)
 
-    logger = get_main_logger(_package_name)
     args = parser.parse_args()
 
     logger.info("Starting entrypoint.")

diff --git a/bin/workflow_glue/check_bam_headers_in_dir.py b/bin/workflow_glue/check_bam_headers_in_dir.py
@@ -8,11 +8,6 @@
 from .util import get_named_logger, wf_parser  # noqa: ABS101
 
 
-def get_sq_lines(xam_file):
-    """Extract the `@SQ` lines from the header of a XAM file."""
-    return pysam.AlignmentFile(xam_file, check_sq=False).header["SQ"]
-
-
 def main(args):
     """Run the entry point."""
     logger = get_named_logger("checkBamHdr")
@@ -27,10 +22,26 @@ def main(args):
     # Set `is_unaligned` accordingly. If there are mixed headers (either with some files
     # containing `@SQ` lines and some not or with different files containing different
     # `@SQ` lines), set `mixed_headers` to `True`.
+    # Also check if there is the SO line, to validate whether the file is (un)sorted.
     first_sq_lines = None
     mixed_headers = False
+    sorted_xam = False
     for xam_file in target_files:
-        sq_lines = get_sq_lines(xam_file)
+        # get the `@SQ` and `@HD` lines in the header
+        with pysam.AlignmentFile(xam_file, check_sq=False) as f:
+            # compare only the SN/LN/M5 elements of SQ to avoid labelling XAM with
+            # same reference but different SQ.UR as mixed_header (see CW-4842)
+            sq_lines = [{
+                "SN": sq["SN"],
+                "LN": sq["LN"],
+                "M5": sq.get("M5"),
+            } for sq in f.header.get("SQ", [])]
+            hd_lines = f.header.get("HD")
+        # Check if it is sorted.
+        # When there is more than one BAM, merging/sorting
+        # will happen regardless of this flag.
+        if hd_lines is not None and hd_lines.get('SO') == 'coordinate':
+            sorted_xam = True
         if first_sq_lines is None:
             # this is the first file
             first_sq_lines = sq_lines
@@ -46,13 +57,15 @@ def main(args):
     # write `is_unaligned` and `mixed_headers` out so that they can be set as env.
     # variables
     sys.stdout.write(
-        f"IS_UNALIGNED={int(is_unaligned)};MIXED_HEADERS={int(mixed_headers)}"
+        f"IS_UNALIGNED={int(is_unaligned)};" +
+        f"MIXED_HEADERS={int(mixed_headers)};" +
+        f"IS_SORTED={int(sorted_xam)}"
     )
     logger.info(f"Checked (u)BAM headers in '{args.input_path}'.")
 
 
 def argparser():
     """Argument parser for entrypoint."""
-    parser = wf_parser("check_bam_headers")
+    parser = wf_parser("check_bam_headers_in_dir")
     parser.add_argument("input_path", type=Path, help="Path to target directory")
     return parser
diff --git a/bin/workflow_glue/check_sample_sheet.py b/bin/workflow_glue/check_sample_sheet.py
@@ -38,6 +38,7 @@ def main(args):
     barcodes = []
     aliases = []
     sample_types = []
+    analysis_groups = []
     allowed_sample_types = [
         "test_sample", "positive_control", "negative_control", "no_template_control"
     ]
@@ -49,6 +50,21 @@ def main(args):
     try:
         encoding = determine_codec(args.sample_sheet)
         with open(args.sample_sheet, "r", encoding=encoding) as f:
+            try:
+                # Excel files don't throw any error until here
+                csv.Sniffer().sniff(f.readline())
+                f.seek(0)  # return to initial position again
+            except Exception as e:
+                # Excel fails with UniCode error
+                sys.stdout.write(
+                    "The sample sheet doesn't seem to be a CSV file.\n"
+                    "The sample sheet has to be a CSV file.\n"
+                    "Please verify that the sample sheet is a CSV file.\n"
+                    f"Parsing error: {e}"
+                 )
+
+                sys.exit()
+
             csv_reader = csv.DictReader(f)
             n_row = 0
             for row in csv_reader:
@@ -76,6 +92,10 @@ def main(args):
                     sample_types.append(row["type"])
                 except KeyError:
                     pass
+                try:
+                    analysis_groups.append(row["analysis_group"])
+                except KeyError:
+                    pass
     except Exception as e:
         sys.stdout.write(f"Parsing error: {e}")
         sys.exit()
@@ -121,6 +141,14 @@ def main(args):
                     sys.stdout.write(
                         f"Sample sheet requires at least 1 of {required_type}")
                     sys.exit()
+    if analysis_groups:
+        # if there was a "analysis_group" column, make sure it had values for all
+        # samples
+        if not all(analysis_groups):
+            sys.stdout.write(
+                "if an 'analysis_group' column exists, it needs values in each row"
+            )
+            sys.exit()
 
     logger.info(f"Checked sample sheet {args.sample_sheet}.")
 

diff --git a/bin/workflow_glue/check_xam_index.py b/bin/workflow_glue/check_xam_index.py
@@ -0,0 +1,43 @@
+"""Validate a single (u)BAM file index."""
+
+from pathlib import Path
+import sys
+
+import pysam
+
+from .util import get_named_logger, wf_parser  # noqa: ABS101
+
+
+def validate_xam_index(xam_file):
+    """Use fetch to validate the index.
+
+    Invalid indexes will fail the call with a ValueError:
+    ValueError: fetch called on bamfile without index
+    """
+    with pysam.AlignmentFile(xam_file, check_sq=False) as alignments:
+        try:
+            alignments.fetch()
+            has_valid_index = True
+        except ValueError:
+            has_valid_index = False
+    return has_valid_index
+
+
+def main(args):
+    """Run the entry point."""
+    logger = get_named_logger("checkBamIdx")
+
+    # Check if a XAM has a valid index
+    has_valid_index = validate_xam_index(args.input_xam)
+    # write `has_valid_index` out so that they can be set as env.
+    sys.stdout.write(
+        f"HAS_VALID_INDEX={int(has_valid_index)}"
+    )
+    logger.info(f"Checked (u)BAM index for: '{args.input_xam}'.")
+
+
+def argparser():
+    """Argument parser for entrypoint."""
+    parser = wf_parser("check_xam_index")
+    parser.add_argument("input_xam", type=Path, help="Path to target XAM")
+    return parser