Skip to content

Commit

Permalink
Merge branch 'template_0924' into 'dev'
Browse files Browse the repository at this point in the history
Template update

See merge request epi2melabs/workflows/wf-cas9!56
  • Loading branch information
mattdmem committed Sep 13, 2024
2 parents 7811221 + 2e875e3 commit 10243fd
Show file tree
Hide file tree
Showing 17 changed files with 1,110 additions and 196 deletions.
14 changes: 10 additions & 4 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,19 @@ body:
label: Workflow Execution
description: Where are you running the workflow?
options:
- EPI2ME Desktop application
- Command line
- EPI2ME cloud agent
- EPI2ME Desktop (Local)
- EPI2ME Desktop (Cloud)
- Command line (Local)
- Command line (Cluster)
- Other (please describe)
validations:
required: true

- type: input
id: other-workflow-execution
attributes:
label: Other workflow execution
description: If "Other", please describe
placeholder: Tell us where / how you are running the workflow.

- type: markdown
attributes:
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ repos:
always_run: true
pass_filenames: false
additional_dependencies:
- epi2melabs>=0.0.52
- epi2melabs==0.0.57
- id: build_models
name: build_models
entry: datamodel-codegen --strict-nullable --base-class workflow_glue.results_schema_helpers.BaseModel --use-schema-description --disable-timestamp --input results_schema.yml --input-file-type openapi --output bin/workflow_glue/results_schema.py
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [v1.1.2]
### Changed
- Updated Ezcharts to v0.11.2.

## [v1.1.1]
### Changed
- The name of the column with run IDs from `run_ids` to `run_id`.
Expand Down
78 changes: 49 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,37 +43,64 @@ ARM processor support: True

## Install and run

<!---Nextflow text remains the same across workflows, update example cmd and demo data sections.--->
These are instructions to install and run the workflow on command line. You can also access the workflow via the [EPI2ME application](https://labs.epi2me.io/downloads/).

The workflow uses [Nextflow](https://www.nextflow.io/) to manage compute and software resources, therefore nextflow will need to be installed before attempting to run the workflow.

The workflow can currently be run using either [Docker](https://www.docker.com/products/docker-desktop) or
[Singularity](https://docs.sylabs.io/guides/3.0/user-guide/index.html) to provide isolation of
the required software. Both methods are automated out-of-the-box provided
either docker or singularity is installed. This is controlled by the [`-profile`](https://www.nextflow.io/docs/latest/config.html#config-profiles) parameter as exemplified below.

It is not required to clone or download the git repository in order to run the workflow.
More information on running EPI2ME workflows can be found on our [website](https://labs.epi2me.io/wfindex).

The following command can be used to obtain the workflow. This will pull the repository in to the assets folder of nextflow and provide a list of all parameters available for the workflow as well as an example command:
These are instructions to install and run the workflow on command line.
You can also access the workflow via the
[EPI2ME Desktop application](https://labs.epi2me.io/downloads/).

The workflow uses [Nextflow](https://www.nextflow.io/) to manage
compute and software resources,
therefore Nextflow will need to be
installed before attempting to run the workflow.

The workflow can currently be run using either
[Docker](https://www.docker.com/products/docker-desktop)
or [Singularity](https://docs.sylabs.io/guides/3.0/user-guide/index.html)
to provide isolation of the required software.
Both methods are automated out-of-the-box provided
either Docker or Singularity is installed.
This is controlled by the
[`-profile`](https://www.nextflow.io/docs/latest/config.html#config-profiles)
parameter as exemplified below.

It is not required to clone or download the git repository
in order to run the workflow.
More information on running EPI2ME workflows can
be found on our [website](https://labs.epi2me.io/wfindex).

The following command can be used to obtain the workflow.
This will pull the repository in to the assets folder of
Nextflow and provide a list of all parameters
available for the workflow as well as an example command:

```
nextflow run epi2me-labs/wf-cas9 help
nextflow run epi2me-labs/wf-cas9 --help
```
A demo dataset is provided for testing of the workflow. It can be downloaded using:
To update a workflow to the latest version on the command line use
the following command:
```
wget https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-cas9/wf-cas9-demo.tar.gz \
&& tar -xvf wf-cas9-demo.tar.gz
nextflow pull epi2me-labs/wf-cas9
```
The workflow can be run with the demo data using:

A demo dataset is provided for testing of the workflow.
It can be downloaded and unpacked using the following commands:
```
wget https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-cas9/wf-cas9-demo.tar.gz
tar -xzvf wf-cas9-demo.tar.gz
```
The workflow can then be run with the downloaded demo data using:
```
nextflow run epi2me-labs/wf-cas9 \
--fastq wf-cas9-demo/fastq/ \
--reference_genome wf-cas9-demo/grch38/grch38_chr19_22.fa.gz \
--targets wf-cas9-demo/targets.bed
--fastq 'wf-cas9-demo/fastq/sample_1' \
--full_report \
--reference_genome 'wf-cas9-demo/grch38/grch38_chr19_22.fa.gz' \
--targets 'wf-cas9-demo/targets.bed' \
-profile standard
```
For further information about running a workflow on the cmd line see https://labs.epi2me.io/wfquickstart/

For further information about running a workflow on
the command line see https://labs.epi2me.io/wfquickstart/




Expand Down Expand Up @@ -145,13 +172,6 @@ input_reads.fastq ─── input_directory ─── input_directory
| threads | integer | Number of CPU threads to use per workflow task. | The total CPU resource used by the workflow is constrained by the executor configuration. | 8 |


### Miscellaneous Options

| Nextflow parameter name | Type | Description | Help | Default |
|--------------------------|------|-------------|------|---------|
| disable_ping | boolean | Enable to prevent sending a workflow ping. | | False |





Expand Down
32 changes: 21 additions & 11 deletions bin/workflow_glue/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import glob
import importlib
import os
import sys

from .util import _log_level, get_main_logger # noqa: ABS101

Expand All @@ -11,15 +12,17 @@
_package_name = "workflow_glue"


def get_components():
def get_components(allowed_components=None):
"""Find a list of workflow command scripts."""
logger = get_main_logger(_package_name)
path = os.path.dirname(os.path.abspath(__file__))
components = list()
components = dict()
for fname in glob.glob(os.path.join(path, "*.py")):
name = os.path.splitext(os.path.basename(fname))[0]
if name in ("__init__", "util"):
continue
if allowed_components is not None and name not in allowed_components:
continue

# leniently attempt to import module
try:
Expand All @@ -34,14 +37,16 @@ def get_components():
try:
req = "main", "argparser"
if all(callable(getattr(mod, x)) for x in req):
components.append(name)
components[name] = mod
except Exception:
pass
return components


def cli():
"""Run workflow entry points."""
logger = get_main_logger(_package_name)
logger.info("Bootstrapping CLI.")
parser = argparse.ArgumentParser(
'wf-glue',
parents=[_log_level()],
Expand All @@ -56,16 +61,21 @@ def cli():
help='additional help', dest='command')
subparsers.required = True

# all component demos, plus some others
components = [
f'{_package_name}.{comp}' for comp in get_components()]
for module in components:
mod = importlib.import_module(module)
# importing everything can take time, try to shortcut
if len(sys.argv) > 1:
components = get_components(allowed_components=[sys.argv[1]])
if not sys.argv[1] in components:
logger.warn("Importing all modules, this may take some time.")
components = get_components()
else:
components = get_components()

# add all module parsers to main CLI
for name, module in components.items():
p = subparsers.add_parser(
module.split(".")[-1], parents=[mod.argparser()])
p.set_defaults(func=mod.main)
name.split(".")[-1], parents=[module.argparser()])
p.set_defaults(func=module.main)

logger = get_main_logger(_package_name)
args = parser.parse_args()

logger.info("Starting entrypoint.")
Expand Down
29 changes: 21 additions & 8 deletions bin/workflow_glue/check_bam_headers_in_dir.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,6 @@
from .util import get_named_logger, wf_parser # noqa: ABS101


def get_sq_lines(xam_file):
"""Extract the `@SQ` lines from the header of a XAM file."""
return pysam.AlignmentFile(xam_file, check_sq=False).header["SQ"]


def main(args):
"""Run the entry point."""
logger = get_named_logger("checkBamHdr")
Expand All @@ -27,10 +22,26 @@ def main(args):
# Set `is_unaligned` accordingly. If there are mixed headers (either with some files
# containing `@SQ` lines and some not or with different files containing different
# `@SQ` lines), set `mixed_headers` to `True`.
# Also check if there is the SO line, to validate whether the file is (un)sorted.
first_sq_lines = None
mixed_headers = False
sorted_xam = False
for xam_file in target_files:
sq_lines = get_sq_lines(xam_file)
# get the `@SQ` and `@HD` lines in the header
with pysam.AlignmentFile(xam_file, check_sq=False) as f:
# compare only the SN/LN/M5 elements of SQ to avoid labelling XAM with
# same reference but different SQ.UR as mixed_header (see CW-4842)
sq_lines = [{
"SN": sq["SN"],
"LN": sq["LN"],
"M5": sq.get("M5"),
} for sq in f.header.get("SQ", [])]
hd_lines = f.header.get("HD")
# Check if it is sorted.
# When there is more than one BAM, merging/sorting
# will happen regardless of this flag.
if hd_lines is not None and hd_lines.get('SO') == 'coordinate':
sorted_xam = True
if first_sq_lines is None:
# this is the first file
first_sq_lines = sq_lines
Expand All @@ -46,13 +57,15 @@ def main(args):
# write `is_unaligned` and `mixed_headers` out so that they can be set as env.
# variables
sys.stdout.write(
f"IS_UNALIGNED={int(is_unaligned)};MIXED_HEADERS={int(mixed_headers)}"
f"IS_UNALIGNED={int(is_unaligned)};" +
f"MIXED_HEADERS={int(mixed_headers)};" +
f"IS_SORTED={int(sorted_xam)}"
)
logger.info(f"Checked (u)BAM headers in '{args.input_path}'.")


def argparser():
"""Argument parser for entrypoint."""
parser = wf_parser("check_bam_headers")
parser = wf_parser("check_bam_headers_in_dir")
parser.add_argument("input_path", type=Path, help="Path to target directory")
return parser
28 changes: 28 additions & 0 deletions bin/workflow_glue/check_sample_sheet.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ def main(args):
barcodes = []
aliases = []
sample_types = []
analysis_groups = []
allowed_sample_types = [
"test_sample", "positive_control", "negative_control", "no_template_control"
]
Expand All @@ -49,6 +50,21 @@ def main(args):
try:
encoding = determine_codec(args.sample_sheet)
with open(args.sample_sheet, "r", encoding=encoding) as f:
try:
# Excel files don't throw any error until here
csv.Sniffer().sniff(f.readline())
f.seek(0) # return to initial position again
except Exception as e:
# Excel fails with UniCode error
sys.stdout.write(
"The sample sheet doesn't seem to be a CSV file.\n"
"The sample sheet has to be a CSV file.\n"
"Please verify that the sample sheet is a CSV file.\n"
f"Parsing error: {e}"
)

sys.exit()

csv_reader = csv.DictReader(f)
n_row = 0
for row in csv_reader:
Expand Down Expand Up @@ -76,6 +92,10 @@ def main(args):
sample_types.append(row["type"])
except KeyError:
pass
try:
analysis_groups.append(row["analysis_group"])
except KeyError:
pass
except Exception as e:
sys.stdout.write(f"Parsing error: {e}")
sys.exit()
Expand Down Expand Up @@ -121,6 +141,14 @@ def main(args):
sys.stdout.write(
f"Sample sheet requires at least 1 of {required_type}")
sys.exit()
if analysis_groups:
# if there was a "analysis_group" column, make sure it had values for all
# samples
if not all(analysis_groups):
sys.stdout.write(
"if an 'analysis_group' column exists, it needs values in each row"
)
sys.exit()

logger.info(f"Checked sample sheet {args.sample_sheet}.")

Expand Down
43 changes: 43 additions & 0 deletions bin/workflow_glue/check_xam_index.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
"""Validate a single (u)BAM file index."""

from pathlib import Path
import sys

import pysam

from .util import get_named_logger, wf_parser # noqa: ABS101


def validate_xam_index(xam_file):
"""Use fetch to validate the index.
Invalid indexes will fail the call with a ValueError:
ValueError: fetch called on bamfile without index
"""
with pysam.AlignmentFile(xam_file, check_sq=False) as alignments:
try:
alignments.fetch()
has_valid_index = True
except ValueError:
has_valid_index = False
return has_valid_index


def main(args):
"""Run the entry point."""
logger = get_named_logger("checkBamIdx")

# Check if a XAM has a valid index
has_valid_index = validate_xam_index(args.input_xam)
# write `has_valid_index` out so that they can be set as env.
sys.stdout.write(
f"HAS_VALID_INDEX={int(has_valid_index)}"
)
logger.info(f"Checked (u)BAM index for: '{args.input_xam}'.")


def argparser():
"""Argument parser for entrypoint."""
parser = wf_parser("check_xam_index")
parser.add_argument("input_xam", type=Path, help="Path to target XAM")
return parser
Loading

0 comments on commit 10243fd

Please sign in to comment.