Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Orthanq wrappers for hla and virus applications #2640

Open
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

huzuner
Copy link
Contributor

@huzuner huzuner commented Feb 4, 2024

QC

  • I confirm that:

For all wrappers added by this PR,

  • there is a test case which covers any introduced changes,
  • input: and output: file paths in the resulting rule can be changed arbitrarily,
  • either the wrapper can only use a single core, or the example rule contains a threads: x statement with x being a reasonable default,
  • rule names in the test case are in snake_case and somehow tell what the rule is about or match the tools purpose or name (e.g., map_reads for a step that maps reads),
  • all environment.yaml specifications follow the respective best practices,
  • the environment.yaml pinning has been updated by running snakedeploy pin-conda-envs environment.yaml on a linux machine,
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:),
  • all fields of the example rules in the Snakefiles and their entries are explained via comments (input:/output:/params: etc.),
  • stderr and/or stdout are logged correctly (log:), depending on the wrapped tool,
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to (see here; this also means that using any Python tempfile default behavior works),
  • the meta.yaml contains a link to the documentation of the respective tool or command,
  • Snakefiles pass the linting (snakemake --lint),
  • Snakefiles are formatted with snakefmt,
  • Python wrapper scripts are formatted with black.
  • Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).

Summary by CodeRabbit

  • New Features

    • Added comprehensive HLA and virus quantification workflows using the Orthanq tool.
    • Introduced configuration files for environment setup, metadata, and test scenarios.
    • Implemented Snakemake rules for HLA and virus processing steps.
  • Documentation

    • Added detailed metadata and documentation for each workflow component.
    • Included GitHub repository links and author information.
  • Chores

    • Created wrapper scripts for executing Orthanq commands.
    • Set up conda environment configurations for dependency management.

@huzuner huzuner changed the title Add Orthanq wrapper feat: add Orthanq wrapper Feb 4, 2024
@huzuner huzuner changed the title feat: add Orthanq wrapper feat: add Orthanq candidates hla wrapper Feb 4, 2024
@huzuner huzuner changed the title feat: add Orthanq candidates hla wrapper feat: add Orthanq candidates hla and virus wrappers Feb 5, 2024
@fgvieira
Copy link
Collaborator

Don't have any experience with orthanq but, from the command line, it seems that it could make more sense to have candidates\hla and cadidates\virus, no?

It seems candidates\hla uses a temp folder; can you use the tmpdir resource?

@huzuner
Copy link
Contributor Author

huzuner commented Mar 21, 2024

Don't have any experience with orthanq but, from the command line, it seems that it could make more sense to have candidates\hla and cadidates\virus, no?

It seems candidates\hla uses a temp folder; can you use the tmpdir resource?

Yes that could make sense aswell. But I was thinking of putting everything related to the each application in one folder, to make the navigation within the usage easier. Therefore the commands are as follows:

orthanq candidates hla
orthanq preprocess hla
orthanq call hla

and

orthanq candidates virus
orthanq preprocess virus
orthanq call virus

also could you eloborate more regarding the temp usage?

Thank you for all the reviewing :)

@fgvieira
Copy link
Collaborator

fgvieira commented May 10, 2024

From what I can see in the docs, it seems that the tool has three subcommands and hla/virus is an option. If so, I'd organize the wrapper the same way as (e.g.) samtools (orthanq/candidates, orthanq/preprocess, and orthanq/call) with hla/virus as a parameter.
However, since orthanq is developed by @johanneskoester lab, I think it would be nice to hear from him.

If the program supports a temp folder, then it would be nice if the wrappers would take care of it. For example, as in:

with tempfile.TemporaryDirectory() as tmpdir:
tmp_prefix = Path(tmpdir) / "samtools_sort"
shell(
"samtools sort {samtools_opts} -m {mem_per_thread_mb}M {extra} -T {tmp_prefix} {snakemake.input[0]} {log}"
)

@huzuner huzuner changed the title feat: add Orthanq candidates hla and virus wrappers feat: add Orthanq wrappers for hla and virus applications Jun 23, 2024
Copy link
Contributor

github-actions bot commented Jan 1, 2025

This PR was marked as stale because it has been open for 6 months with no activity.

@github-actions github-actions bot added the Stale label Jan 1, 2025
Copy link
Contributor

coderabbitai bot commented Jan 1, 2025

Warning

Rate limit exceeded

@huzuner has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 48 minutes and 54 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between c7c1de7 and 3fe3daf.

📒 Files selected for processing (6)
  • bio/orthanq/hla/call/environment.yaml (1 hunks)
  • bio/orthanq/hla/candidates/environment.yaml (1 hunks)
  • bio/orthanq/hla/preprocess/environment.yaml (1 hunks)
  • bio/orthanq/virus/call/environment.yaml (1 hunks)
  • bio/orthanq/virus/candidates/environment.yaml (1 hunks)
  • bio/orthanq/virus/preprocess/environment.yaml (1 hunks)
📝 Walkthrough

Walkthrough

The pull request introduces a comprehensive set of configuration and workflow files for the Orthanq bioinformatics tool, specifically focusing on HLA (Human Leukocyte Antigen) and virus processing workflows. The changes include environment configuration files, metadata descriptions, Snakefile rules, test data, and wrapper scripts for different subcommands such as candidates, preprocess, and call functionalities. These additions provide a structured approach to processing genomic data for HLA typing and viral lineage quantification using the Orthanq tool version 1.4.0.

Changes

File Path Change Summary
bio/orthanq/*/environment.yaml Added conda environment configuration files with channels conda-forge, bioconda, nodefaults and orthanq =1.4.0 dependency
bio/orthanq/*/meta.yaml Introduced metadata files for HLA and virus subcommands, specifying name, description, inputs, outputs, and author information
bio/orthanq/*/test/Snakefile Added Snakemake rules for HLA and virus subcommands (candidates, preprocess, call) with input/output specifications
bio/orthanq/*/wrapper.py Created Python wrapper scripts for executing Orthanq subcommands within Snakemake workflows
bio/orthanq/hla/candidates/test/hla.xml Added XML file with detailed HLA allele information
bio/orthanq/hla/candidates/test/genome.fasta Introduced test genome sequence
bio/orthanq/hla/candidates/test/hla_alleles.fasta Added HLA allele sequence data

Sequence Diagram

sequenceDiagram
    participant User
    participant Snakemake
    participant Orthanq
    
    User->>Snakemake: Trigger workflow
    Snakemake->>Orthanq: Execute candidates subcommand
    Orthanq-->>Snakemake: Generate candidate variants
    Snakemake->>Orthanq: Execute preprocess subcommand
    Orthanq-->>Snakemake: Generate BCF file
    Snakemake->>Orthanq: Execute call subcommand
    Orthanq-->>Snakemake: Generate quantification results
    Snakemake-->>User: Workflow complete
Loading

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (13)
bio/orthanq/hla/preprocess/wrapper.py (1)

9-13: Consider incorporating thread support.
If orthanq preprocess hla supports parallelization, passing snakemake.threads can fulfill the PR objective of flexible threading configuration.

Apply the following diff to pass the number of threads:

 shell(
     "orthanq preprocess hla --genome {snakemake.input.genome} "
     " --haplotype-variants {snakemake.input.haplotype_variants} --reads {snakemake.input.reads} "
-    " --vg-index {snakemake.input.vg_index} --output {snakemake.output[0]} {log}"
+    " --vg-index {snakemake.input.vg_index} --threads {snakemake.threads} --output {snakemake.output[0]} {log}"
 )
bio/orthanq/hla/call/test/Snakefile (2)

8-8: Consider parameter configurability.
Having prior="diploid" is acceptable for defaults, but consider whether this parameter might need to be externally configurable in certain contexts.


9-10: Add thread or resource definition if needed.
If orthanq_call_hla can be parallelized or is computationally intensive, defining threads or additional resources (e.g., memory) would help with workload distribution.

bio/orthanq/hla/call/meta.yaml (4)

2-3: Enhance description with more details about input/output formats.

The description should specify:

  • The expected format of the input BCF file
  • The structure/columns of the output CSV file
  • Any constraints or requirements for the process

5-7: Add version requirements and format specifications for input files.

For better usability:

  • Specify the required version/format of haplotype variants
  • Indicate the format of haplotype calls
  • Add a direct link to the XML download instructions

8-9: Expand parameter documentation.

For the prior parameter:

  • Explain the implications of each option (uniform vs. diploid)
  • Document any default values
  • Add examples of when to use each option

10-11: Specify output format details.

The output description should include:

  • CSV file structure (columns, data types)
  • Any header information
  • Example output snippet
bio/orthanq/hla/candidates/test/hla.xml (1)

61-62: Consider reducing test data size.

The current test file contains complete sequence and translation data. Consider:

  • Using a smaller subset of features for testing
  • Creating a minimal test file that still validates functionality
  • Documenting why this specific allele was chosen for testing

Also applies to: 122-124

bio/orthanq/virus/candidates/test/Snakefile (1)

1-7: Optionally define resource usage.
If these rules run on HPC or multi-core environments, consider specifying resources (e.g., threads, memory) for more predictable scheduling.

bio/orthanq/virus/preprocess/test/Snakefile (1)

1-10: Resources or threads usage.
Consider adding explicit resource or thread specifications to ensure consistent performance across different systems, especially in HPC environments.

bio/orthanq/virus/candidates/wrapper.py (1)

1-11: Add documentation and usage examples.

The wrapper would benefit from comprehensive documentation:

  1. Add a docstring explaining:
    • Required input parameters
    • Output format
    • Available configuration options
  2. Include example usage
  3. Reference to test cases location

Add documentation at the top of the file:

+"""
+Wrapper for Orthanq virus candidates generation.
+
+Required input:
+    reads: Input reads file(s)
+
+Required output:
+    Candidate variants file (VCF format)
+
+Optional parameters:
+    threads: Number of threads to use
+    temp_dir: Directory for temporary files
+    extra: Additional parameters to pass to Orthanq
+
+Example usage:
+    rule orthanq_candidates_virus:
+        input:
+            reads="mapped/virus.bam"
+        output:
+            "candidates/virus.vcf"
+        params:
+            temp_dir="temp",
+            extra=""
+        threads: 4
+        wrapper:
+            "bio/orthanq/virus/candidates"
+"""
+
 __author__ = "Hamdiye Uzuner"
🧰 Tools
🪛 Ruff (0.8.2)

7-7: Undefined name snakemake

(F821)

bio/orthanq/hla/preprocess/test/Snakefile (2)

3-6: Consider introducing a temporary or ephemeral directory for intermediate files.

If the preprocessing steps generate large or sensitive intermediate files, you may want to isolate them in a temporary directory to keep the workspace uncluttered and facilitate cleanup. You could modify the rule parameters to include a temporary directory, for example:

 rule orthanq_preprocess_hla:
     input:
         genome="genome.fasta",
         haplotype_variants="haplotype_variants.vcf",
         vg_index="hprc_prebuilt.xg",
         reads=["reads_1.fq", "reads_2.fq"]
+    params:
+        tempdir=temp("orthanq_preprocess_hla_temp")
     output:
         "calls.bcf",

1-12: Consider adding a threads directive for performance.

If the Orthanq preprocessing step supports multithreading, you can improve performance by specifying threads: in the rule:

 rule orthanq_preprocess_hla:
+    threads: 4
     input:
         ...
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9161b48 and d04db4a.

⛔ Files ignored due to path filters (1)
  • bio/orthanq/hla/candidates/test/allele_freq.csv is excluded by !**/*.csv
📒 Files selected for processing (28)
  • bio/orthanq/hla/call/environment.yaml (1 hunks)
  • bio/orthanq/hla/call/meta.yaml (1 hunks)
  • bio/orthanq/hla/call/test/Snakefile (1 hunks)
  • bio/orthanq/hla/call/test/hla.xml (1 hunks)
  • bio/orthanq/hla/call/wrapper.py (1 hunks)
  • bio/orthanq/hla/candidates/environment.yaml (1 hunks)
  • bio/orthanq/hla/candidates/meta.yaml (1 hunks)
  • bio/orthanq/hla/candidates/test/Snakefile (1 hunks)
  • bio/orthanq/hla/candidates/test/genome.fasta (1 hunks)
  • bio/orthanq/hla/candidates/test/hla.xml (1 hunks)
  • bio/orthanq/hla/candidates/test/hla_alleles.fasta (1 hunks)
  • bio/orthanq/hla/candidates/wrapper.py (1 hunks)
  • bio/orthanq/hla/preprocess/environment.yaml (1 hunks)
  • bio/orthanq/hla/preprocess/meta.yaml (1 hunks)
  • bio/orthanq/hla/preprocess/test/Snakefile (1 hunks)
  • bio/orthanq/hla/preprocess/wrapper.py (1 hunks)
  • bio/orthanq/virus/call/environment.yaml (1 hunks)
  • bio/orthanq/virus/call/meta.yaml (1 hunks)
  • bio/orthanq/virus/call/test/Snakefile (1 hunks)
  • bio/orthanq/virus/call/wrapper.py (1 hunks)
  • bio/orthanq/virus/candidates/environment.yaml (1 hunks)
  • bio/orthanq/virus/candidates/meta.yaml (1 hunks)
  • bio/orthanq/virus/candidates/test/Snakefile (1 hunks)
  • bio/orthanq/virus/candidates/wrapper.py (1 hunks)
  • bio/orthanq/virus/preprocess/environment.yaml (1 hunks)
  • bio/orthanq/virus/preprocess/meta.yaml (1 hunks)
  • bio/orthanq/virus/preprocess/test/Snakefile (1 hunks)
  • bio/orthanq/virus/preprocess/wrapper.py (1 hunks)
✅ Files skipped from review due to trivial changes (13)
  • bio/orthanq/hla/candidates/test/genome.fasta
  • bio/orthanq/hla/candidates/environment.yaml
  • bio/orthanq/virus/call/environment.yaml
  • bio/orthanq/hla/call/environment.yaml
  • bio/orthanq/virus/preprocess/environment.yaml
  • bio/orthanq/virus/candidates/environment.yaml
  • bio/orthanq/hla/preprocess/environment.yaml
  • bio/orthanq/hla/candidates/test/hla_alleles.fasta
  • bio/orthanq/virus/candidates/meta.yaml
  • bio/orthanq/hla/preprocess/meta.yaml
  • bio/orthanq/virus/preprocess/meta.yaml
  • bio/orthanq/virus/call/meta.yaml
  • bio/orthanq/hla/candidates/meta.yaml
🧰 Additional context used
📓 Path-based instructions (6)
bio/orthanq/hla/preprocess/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

bio/orthanq/virus/preprocess/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

bio/orthanq/hla/candidates/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

bio/orthanq/virus/call/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

bio/orthanq/hla/call/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

bio/orthanq/virus/candidates/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

🪛 Ruff (0.8.2)
bio/orthanq/hla/preprocess/wrapper.py

7-7: Undefined name snakemake

(F821)

bio/orthanq/virus/preprocess/wrapper.py

7-7: Undefined name snakemake

(F821)

bio/orthanq/hla/candidates/wrapper.py

7-7: Undefined name snakemake

(F821)

bio/orthanq/virus/call/wrapper.py

7-7: Undefined name snakemake

(F821)


8-8: Undefined name snakemake

(F821)

bio/orthanq/hla/call/wrapper.py

7-7: Undefined name snakemake

(F821)


8-8: Undefined name snakemake

(F821)

bio/orthanq/virus/candidates/wrapper.py

7-7: Undefined name snakemake

(F821)

🔇 Additional comments (24)
bio/orthanq/virus/preprocess/wrapper.py (3)

1-3: All metadata is properly declared.

The licensing and attribution metadata at the top is clear and well-defined.


7-7: Valid usage of snakemake context.

This usage of snakemake.log_fmt_shell is a standard approach in Snakemake wrappers to capture logs. Nothing to address here.

🧰 Tools
🪛 Ruff (0.8.2)

7-7: Undefined name snakemake

(F821)


9-12: Ensure the subcommand parameters match expected workflow inputs.

The shell command for “orthanq preprocess virus” looks correct for the specified input and output. Just verify that downstream steps are ready to consume {snakemake.output[0]}.

✅ Verification successful

The output file is consistently handled across the codebase

The search results show that snakemake.output[0] is a standard way to access the first output file in Snakemake wrappers across the codebase. The pattern is used consistently in many wrappers for different tools, and in each case, it's used to specify the primary output file path. This is exactly how it's being used in the orthanq preprocess virus wrapper.

Looking specifically at the orthanq-related wrappers:

  • bio/orthanq/virus/preprocess/wrapper.py (the file under review)
  • bio/orthanq/virus/candidates/wrapper.py
  • bio/orthanq/virus/call/wrapper.py
  • bio/orthanq/hla/candidates/wrapper.py
  • bio/orthanq/hla/preprocess/wrapper.py
  • bio/orthanq/hla/call/wrapper.py

All of these wrappers use snakemake.output[0] consistently as the output file parameter for their respective commands, indicating that this is the established pattern for orthanq wrappers.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for usage of {snakemake.output[0]} in the rest of the codebase to ensure consistent handling.
rg -A 3 "snakemake.output\[0\]"

Length of output: 37130

bio/orthanq/hla/preprocess/wrapper.py (2)

1-3: Metadata aligns with best practices.
The author, copyright, and license declarations are all in order, matching typical conventions.


7-7: Logging approach is appropriate for Snakemake.
Utilizing Snakemake’s built-in log formatting ensures that stdout and stderr are captured properly.

🧰 Tools
🪛 Ruff (0.8.2)

7-7: Undefined name snakemake

(F821)

bio/orthanq/hla/call/test/Snakefile (2)

1-1: Rule naming is clear and consistent.
The rule name orthanq_call_hla follows snake_case convention and accurately reflects the function's intended use.


2-5: Ensure upstream availability of input files.
Kindly verify that haplotype_calls.bcf, haplotype_variants.vcf, and hla.xml are created or available upstream in the workflow before this rule executes.

bio/orthanq/hla/candidates/test/Snakefile (4)

1-1: Rule name matches typical wrapper usage.
Using orthanq_candidates_hla clarifies the process being run.


2-5: Verify existence of required input files.
Please confirm that allele_freq.csv, hla_alleles.fasta, genome.fasta, and hla.xml are generated or provided by upstream rules or input channels.


8-8: Output extension alignment.
candidates.vcf has a VCF extension, which aligns well with typical variant output. Confirm that any downstream rules are prepared to handle this file.


9-10: Log management approach looks good.
Capturing logs in logs/orthanq_candidates/candidates_hla.log allows for easier debugging and traceability.

bio/orthanq/hla/candidates/wrapper.py (2)

1-3: Metadata is helpful.
Author and license information are clearly stated.


9-13: Wrapper command structure is clear.
The invocation of orthanq candidates hla correctly references input files and expected outputs. Proper logging is also configured.

bio/orthanq/hla/call/wrapper.py (3)

1-3: Author and license metadata.
Including author, copyright, and license details helps maintain clarity of ownership and usage terms.


7-8: Parameter usage is concise.
prior_opt = snakemake.params.get("prior", "") is a clean way to handle optional parameters.

🧰 Tools
🪛 Ruff (0.8.2)

7-7: Undefined name snakemake

(F821)


8-8: Undefined name snakemake

(F821)


9-13: Well-structured shell command.
All relevant inputs, including haplotype_calls and haplotype_variants, are properly integrated into the command. Logging is correctly handled.

bio/orthanq/hla/candidates/test/hla.xml (1)

1-127: Verify test data licensing and attribution.

This appears to be data from the IPD-IMGT/HLA database. Please ensure:

  1. The test data usage complies with the database's terms of use
  2. Proper attribution is included in the repository
  3. The data version is documented
bio/orthanq/virus/candidates/test/Snakefile (1)

3-3: Consider explicitly naming output files.
What files are created in the viral_candidates directory? Aligning with a past request, try naming key outputs for clarity.

bio/orthanq/virus/call/test/Snakefile (1)

1-11: Validate parameter usage.
Double-check that "uniform" is the intended default for the prior parameter. If you anticipate user-defined priors, consider exposing a parameter option in your workflow configuration.

bio/orthanq/virus/candidates/wrapper.py (3)

1-4: LGTM!

The metadata section is complete with appropriate author, copyright, and license information.


5-7: LGTM!

The imports and logging setup follow Snakemake conventions correctly.

🧰 Tools
🪛 Ruff (0.8.2)

7-7: Undefined name snakemake

(F821)


9-11: 🛠️ Refactor suggestion

Add essential parameters and error handling to the Orthanq command.

The current implementation appears to be missing several important parameters:

  1. Input files/parameters for virus candidate generation
  2. Threading configuration for performance optimization
  3. Temporary directory usage (mentioned in PR discussion)
  4. Error handling for missing or invalid files

Consider updating the shell command to include these parameters:

shell(
-    "orthanq candidates virus --output {snakemake.output[0]} {log}"
+    "orthanq candidates virus"
+    " --input {snakemake.input.reads}"
+    " --output {snakemake.output[0]}"
+    " --threads {snakemake.threads}"
+    " --temp-dir {snakemake.params.temp_dir}"
+    " {snakemake.params.extra}"
+    " {log}"
)

Let's verify the Orthanq command parameters:

bio/orthanq/hla/preprocess/test/Snakefile (2)

1-1: Rule name follows snake_case best practices.


8-8: Check if a BCF index is required.

Typically, BCF files benefit from an accompanying index (e.g., CSI or TBI) if further downstream analysis is expected. Please verify if the downstream steps need an indexed BCF and consider adding it to the output. Here’s a possible snippet:

 output:
-    "calls.bcf",
+    "calls.bcf",
+    "calls.bcf.csi",

bio/orthanq/hla/call/test/hla.xml Show resolved Hide resolved
bio/orthanq/virus/call/wrapper.py Outdated Show resolved Hide resolved
@fgvieira
Copy link
Collaborator

fgvieira commented Jan 3, 2025

Hi @huzuner, did you have the time to look into the comments above?

@fgvieira fgvieira removed the Stale label Jan 3, 2025
huzuner and others added 3 commits January 6, 2025 15:53
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@huzuner
Copy link
Contributor Author

huzuner commented Jan 6, 2025

From what I can see in the docs, it seems that the tool has three subcommands and hla/virus is an option. If so, I'd organize the wrapper the same way as (e.g.) samtools (orthanq/candidates, orthanq/preprocess, and orthanq/call) with hla/virus as a parameter. However, since orthanq is developed by @johanneskoester lab, I think it would be nice to hear from him.

If the program supports a temp folder, then it would be nice if the wrappers would take care of it. For example, as in:

with tempfile.TemporaryDirectory() as tmpdir:
tmp_prefix = Path(tmpdir) / "samtools_sort"
shell(
"samtools sort {samtools_opts} -m {mem_per_thread_mb}M {extra} -T {tmp_prefix} {snakemake.input[0]} {log}"
)

I thought about your suggestion but I couldn't be sure whether that would be a better way of organizing it or not. My idea initially was that because the tool has two applications, hla and virus, I wanted to organize the folders in that manner in order to make this more prominent, but your idea also does not seem wrong. In the end I guess I'd rather have it as it is now if that's not really a major issue. We can of course always reorganize it later.

In addition, the tool does not have a temp option at the moment.

@huzuner huzuner requested a review from fgvieira January 6, 2025 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants