Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more 2.0 milestones #208

Merged
merged 13 commits into from
Feb 17, 2025
Merged

more 2.0 milestones #208

merged 13 commits into from
Feb 17, 2025

Conversation

pdimens
Copy link
Owner

@pdimens pdimens commented Feb 17, 2025

This PR:

Summary by CodeRabbit

  • New Features
    • Introduced a new diagnose command to verify workflows.
    • Launched a dedicated simulation command group for variant simulations and linked-read processing.
  • CLI Enhancements
    • Upgraded verbosity options to a three-level numeric scale for more granular output control.
    • Adjusted thread limits across commands for improved resource management.
  • Workflow Updates
    • Refined input validations and pattern matching to streamline command-line interactions.

Copy link

coderabbitai bot commented Feb 17, 2025

Warning

Rate limit exceeded

@pdimens has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 21 minutes and 0 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 5a108fe and e233384.

📒 Files selected for processing (2)
  • .github/workflows/tests.yml (18 hunks)
  • harpy/simulate_linkedreads.py (1 hunks)
📝 Walkthrough

Walkthrough

This pull request introduces extensive updates across the project. The changes standardize command‐line options by converting the --quiet flag into a choice with a conversion utility and capping the --threads option at 999. Numerous modules gain type annotations for improved clarity. The CLI command groups are reorganized (with new commands added, such as diagnose and a separate simulate_linkedreads command, and removal of linkedreads functionality from simulate_variants), and many Snakemake workflow files now use raw string notation for regex constraints. Additionally, GitHub workflows and filters are updated to reflect the new file naming and parameters.

Changes

File(s) Change Summary
harpy/__main__.py Updated CLI organization: created new command groups (e.g., simulate, Troubleshoot, Workflows), re-assigned commands, and restructured simulate commands (using simulate_linkedreads and simulate_variants).
harpy/_cli_types_generic.py, harpy/_misc.py, harpy/_parsers.py, harpy/_printing.py, harpy/_validations.py Added and updated type hints; introduced the convert_to_int utility function; removed/altered unsupported options in Snakemake parameters.
harpy/{align,assembly,deconvolve,demultiplex,downsample,impute,metassembly,phase,preflight,qc,resume,snp,sv}.py Modified CLI option definitions: --quiet changed from a boolean flag to a choice with a callback (convert_to_int), and --threads options now use a maximum value of 999 along with updated help texts.
harpy/snakefiles/... Updated regex wildcard constraints (e.g., for sample, population, etc.) by converting string patterns to raw strings; in some cases (e.g., demultiplex_gen1.smk), rule outputs and logging were adjusted.
harpy/{diagnose.py,simulate_linkedreads.py,simulate_variants.py} Added a new diagnose command; created a separate CLI for simulate_linkedreads and removed the linkedreads command from simulate_variants; updated quiet option handling in these commands.
.github/{workflows/tests.yml,filters.yml} Updated GitHub CI: modified harpy command invocations to include a quiet argument (2) and renamed simulation file references in filters.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DiagnoseCmd
    participant FS as FileSystem
    participant YAML
    participant Subproc as Subprocess

    User->>DiagnoseCmd: Run diagnose <directory>
    DiagnoseCmd->>FS: Check for workflow/config.yaml
    alt Config exists
        DiagnoseCmd->>FS: Read configuration file
        DiagnoseCmd->>YAML: Parse YAML content
        DiagnoseCmd->>Subproc: Construct and execute snakemake debug command
        Subproc-->>DiagnoseCmd: Return command output
        DiagnoseCmd->>User: Display formatted output
    else Config missing
        DiagnoseCmd->>User: Print error and exit(1)
    end
Loading
sequenceDiagram
    participant User
    participant SimLinkCmd
    participant Validator
    participant DirMgr as DirectoryManager
    participant Snakemake

    User->>SimLinkCmd: Invoke linkedreads with genome files & options
    SimLinkCmd->>Validator: Validate FASTA and barcode files
    Validator-->>SimLinkCmd: Return validation status
    SimLinkCmd->>DirMgr: Set up output directory structure
    SimLinkCmd->>SimLinkCmd: Construct Snakemake command with parameters
    SimLinkCmd->>Snakemake: Execute simulation workflow
    Snakemake-->>SimLinkCmd: Return workflow log/output
    SimLinkCmd->>User: Display execution log and status
Loading

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
harpy/qc.py (1)

10-10: Remove unused import.

The convert_to_int import is not used in this file.

-from ._cli_types_generic import convert_to_int, HPCProfile, IntList, SnakemakeParams
+from ._cli_types_generic import HPCProfile, IntList, SnakemakeParams
🧰 Tools
🪛 Ruff (0.8.2)

10-10: ._cli_types_generic.convert_to_int imported but unused

Remove unused import: ._cli_types_generic.convert_to_int

(F401)

harpy/snakefiles/demultiplex_gen1.smk (1)

44-44: Consider removing or conditionally enabling debug print.

The print statement for sample names could be noisy in production. Consider:

  1. Removing it if it's no longer needed for debugging
  2. Making it conditional based on a debug flag
  3. Using the logging system instead of print
-print(samplenames)
+logger.debug(f"Sample names: {samplenames}")
harpy/_launch.py (1)

64-64: Consistent implementation of quiet level checks.

The quiet level checks are consistently implemented across the file, ensuring uniform output control. However, consider extracting these magic numbers into named constants for better maintainability.

+# At the top of the file
+QUIET_LEVEL_VERBOSE = 0
+QUIET_LEVEL_PROGRESS = 1
+QUIET_LEVEL_SILENT = 2

-if quiet < 2:
+if quiet < QUIET_LEVEL_SILENT:

Also applies to: 79-79, 187-187

harpy/simulate.py (1)

138-138: Consider extracting verbosity levels into constants.

The verbosity levels ("0", "1", "2") are repeated across multiple commands. Consider extracting these into module-level constants for better maintainability and to ensure consistency.

+# At the top of the file, after imports
+VERBOSITY_LEVELS = ["0", "1", "2"]
+VERBOSITY_DEFAULT = "0"

 # Then in each command's options
-@click.option('--quiet', show_default = True, default = "0", type = click.Choice(["0", "1", "2"]), callback = convert_to_int, help = 'Verbosity of output. `0` shows all output, `1` shows single progress bar, `2` suppressess all output')
+@click.option('--quiet', show_default = True, default = VERBOSITY_DEFAULT, type = click.Choice(VERBOSITY_LEVELS), callback = convert_to_int, help = 'Verbosity of output. `0` shows all output, `1` shows single progress bar, `2` suppressess all output')

Also applies to: 232-232, 355-355, 458-458, 565-565

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b923ad and 2e8a4a0.

📒 Files selected for processing (41)
  • harpy/__main__.py (2 hunks)
  • harpy/_cli_types_generic.py (2 hunks)
  • harpy/_launch.py (7 hunks)
  • harpy/_misc.py (5 hunks)
  • harpy/_parsers.py (3 hunks)
  • harpy/_printing.py (7 hunks)
  • harpy/_validations.py (16 hunks)
  • harpy/align.py (4 hunks)
  • harpy/assembly.py (2 hunks)
  • harpy/deconvolve.py (2 hunks)
  • harpy/demultiplex.py (3 hunks)
  • harpy/diagnose.py (1 hunks)
  • harpy/downsample.py (2 hunks)
  • harpy/impute.py (2 hunks)
  • harpy/metassembly.py (2 hunks)
  • harpy/phase.py (2 hunks)
  • harpy/preflight.py (3 hunks)
  • harpy/qc.py (2 hunks)
  • harpy/resume.py (1 hunks)
  • harpy/simulate.py (6 hunks)
  • harpy/snakefiles/align_bwa.smk (1 hunks)
  • harpy/snakefiles/align_ema.smk (1 hunks)
  • harpy/snakefiles/align_strobealign.smk (1 hunks)
  • harpy/snakefiles/deconvolve.smk (1 hunks)
  • harpy/snakefiles/demultiplex_gen1.smk (3 hunks)
  • harpy/snakefiles/impute.smk (1 hunks)
  • harpy/snakefiles/phase.smk (1 hunks)
  • harpy/snakefiles/preflight_bam.smk (1 hunks)
  • harpy/snakefiles/preflight_fastq.smk (1 hunks)
  • harpy/snakefiles/qc.smk (1 hunks)
  • harpy/snakefiles/simulate_linkedreads.smk (1 hunks)
  • harpy/snakefiles/snp_freebayes.smk (1 hunks)
  • harpy/snakefiles/snp_mpileup.smk (1 hunks)
  • harpy/snakefiles/sv_leviathan.smk (1 hunks)
  • harpy/snakefiles/sv_leviathan_pop.smk (1 hunks)
  • harpy/snakefiles/sv_naibr.smk (1 hunks)
  • harpy/snakefiles/sv_naibr_phase.smk (1 hunks)
  • harpy/snakefiles/sv_naibr_pop.smk (1 hunks)
  • harpy/snakefiles/sv_naibr_pop_phase.smk (1 hunks)
  • harpy/snp.py (3 hunks)
  • harpy/sv.py (3 hunks)
✅ Files skipped from review due to trivial changes (8)
  • harpy/snakefiles/preflight_bam.smk
  • harpy/snakefiles/align_bwa.smk
  • harpy/snakefiles/phase.smk
  • harpy/snakefiles/sv_naibr.smk
  • harpy/snakefiles/sv_leviathan_pop.smk
  • harpy/snakefiles/snp_mpileup.smk
  • harpy/snakefiles/sv_naibr_pop.smk
  • harpy/snakefiles/impute.smk
🧰 Additional context used
🪛 Ruff (0.8.2)
harpy/diagnose.py

23-23: Local variable workflow is assigned to but never used

Remove assignment to unused variable workflow

(F841)

harpy/qc.py

10-10: ._cli_types_generic.convert_to_int imported but unused

Remove unused import: ._cli_types_generic.convert_to_int

(F401)

harpy/_validations.py

247-247: Use quiet == 2 instead of not quiet != 2

Replace with == operator

(SIM202)


311-311: Do not use bare except

(E722)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: qc
  • GitHub Check: Rebuild Container
🔇 Additional comments (58)
harpy/resume.py (1)

17-18: LGTM! CLI options improvements.

The changes improve type safety and verbosity control:

  1. Thread limit is now explicitly capped at 999
  2. Quiet option now provides granular control with clear documentation
harpy/__main__.py (1)

73-83: LGTM! Improved CLI organization.

The changes enhance the CLI structure by:

  1. Renaming "workflows" to "Workflows" for consistency
  2. Creating a new "Troubleshoot" group for diagnostic commands
  3. Properly organizing commands into logical groups
harpy/snakefiles/deconvolve.smk (1)

14-14: LGTM! Improved regex pattern safety.

Using a raw string for the regex pattern prevents potential issues with escape sequences.

harpy/snakefiles/preflight_fastq.smk (1)

14-14: LGTM! Good use of raw string for regex pattern.

Using a raw string for the regex pattern is a good practice as it prevents potential issues with backslash interpretation and improves readability.

harpy/deconvolve.py (3)

7-7: LGTM! Good addition of type conversion utility.

The import of convert_to_int supports the enhanced verbosity control.


42-42: LGTM! Improved docstring clarity.

The updated description better explains the function's purpose by focusing on the relationship between barcodes and molecules.


32-33: LGTM! Enhanced CLI options.

Good improvements to the CLI options:

  1. Setting a reasonable maximum thread limit prevents resource exhaustion
  2. The new verbosity levels provide better control over output
harpy/_printing.py (1)

12-12: LGTM! Great addition of type annotations.

The addition of type hints to all functions improves code clarity, helps catch type-related errors during development, and makes the API more self-documenting.

Also applies to: 25-25, 37-37, 50-50, 63-63, 69-69, 81-81, 98-98

harpy/downsample.py (2)

9-9: LGTM! Good addition of type conversion utility.

The import of convert_to_int supports the enhanced verbosity control.


32-33: LGTM! Enhanced CLI options.

Good improvements to the CLI options:

  1. Setting a reasonable maximum thread limit prevents resource exhaustion
  2. The new verbosity levels provide better control over output
harpy/metassembly.py (1)

37-37: LGTM! CLI options standardization.

The changes to --threads and --quiet options improve consistency and control across the CLI. The --quiet option now provides more granular control over output verbosity.

Also applies to: 41-41

harpy/impute.py (1)

34-34: LGTM! Consistent CLI options.

The changes to --threads and --quiet options align with the standardization across the codebase:

  • Thread limit is now capped at 999 for better resource control
  • Verbosity control is more granular with three levels

Also applies to: 39-39

harpy/demultiplex.py (2)

44-44: LGTM! Standardized CLI options.

The changes to --threads and --quiet options improve consistency with other commands.

Also applies to: 49-49


77-77:

❓ Verification inconclusive

Verify the impact of return_len parameter.

The validate_demuxschema call now includes return_len=False.

Let's check how this parameter affects the schema validation:


🏁 Script executed:

#!/bin/bash
# Search for the validate_demuxschema function definition and its usage
ast-grep --pattern 'def validate_demuxschema($_, $_) {
  $$$
}'

Length of output: 65


🏁 Script executed:

#!/bin/bash
# Search for the definition of validate_demuxschema using ripgrep with a looser pattern
rg -n "def[[:space:]]\+validate_demuxschema" .

Length of output: 48


Action Required: Verify the impact of the return_len parameter

The call in line 77 of harpy/demultiplex.py now sets return_len=False, but our searches couldn’t locate the definition of validate_demuxschema to confirm how this parameter is used. Please manually verify that this parameter is handled as expected—whether it affects the schema validation process or if it might be inherited from another module.

  • Check the implementation: Confirm if validate_demuxschema exists in another module or if its definition dynamically handles parameters.
  • Review tests: Ensure that tests covering schema validation capture any impact from return_len.
harpy/phase.py (1)

36-36: LGTM! Standardized CLI options.

The changes to --threads and --quiet options maintain consistency with other commands in the codebase:

  • Thread limit is capped at 999
  • Verbosity control offers three levels (0, 1, 2)

Also applies to: 42-42

harpy/qc.py (1)

36-36: LGTM!

The thread limit change from max_open=True to max=999 is consistent with similar changes in other files, providing a reasonable upper bound for thread count.

harpy/_parsers.py (1)

13-13: LGTM!

The addition of type hints to function signatures improves code readability and maintainability. The changes are consistent and well-documented:

  • getnames(directory: str, ext: str) -> list[str]
  • parse_fastq_inputs(inputs: list[str]) -> Tuple[list[str], int]
  • parse_alignment_inputs(inputs:list[str]) -> Tuple[list[str], int]
  • biallelic_contigs(vcf: str, workdir: str) -> Tuple[str,int]

Also applies to: 21-21, 69-69, 121-121

harpy/_misc.py (3)

18-32: LGTM!

The progress bar changes enhance user experience by:

  • Adding support for different verbosity levels through the quiet parameter
  • Replacing TextColumn with TaskProgressColumn for better progress display
  • Conditionally formatting progress text based on verbosity level

34-45: LGTM!

The pulse bar changes improve consistency with the progress bar by:

  • Adding type hints for parameters
  • Supporting verbosity control through the quiet parameter

47-134: LGTM!

The addition of type hints to function signatures improves code readability and maintainability:

  • symlink(original: str, destination: str) -> None
  • fetch_script(workdir: str, target: str) -> None
  • fetch_rule(workdir: str, target: str) -> None
  • fetch_report(workdir: str, target: str) -> None
  • snakemake_log(outdir: str, workflow: str) -> str
  • gzip_file(infile: str) -> None
  • safe_read(file_path: str)
🧰 Tools
🪛 Ruff (0.8.2)

96-96: Do not use bare except

(E722)


101-101: f-string without any placeholders

Remove extraneous f prefix

(F541)


110-110: Do not use bare except

(E722)


115-115: f-string without any placeholders

Remove extraneous f prefix

(F541)

harpy/assembly.py (2)

51-51: LGTM!

The thread limit change from max_open=True to max=999 is consistent with similar changes in other files, providing a reasonable upper bound for thread count.


55-55: LGTM!

The quiet option change enhances verbosity control by:

  • Replacing boolean flag with choice type ("0", "1", "2")
  • Using convert_to_int callback for type conversion
  • Adding clear help text explaining verbosity levels
harpy/snakefiles/qc.smk (1)

14-14: LGTM!

The change to use a raw string for the regex pattern ensures consistent and correct pattern matching behavior.

harpy/preflight.py (3)

7-7: LGTM!

The import of convert_to_int is required for the new quiet option implementation.


40-40: LGTM!

Setting a fixed upper bound of 999 threads is a reasonable limit that prevents potential resource exhaustion while still allowing for high parallelization.

Also applies to: 102-102


41-41: LGTM!

The change from a boolean flag to a choice type for the quiet option provides more granular control over output verbosity:

  • 0: Shows all output
  • 1: Shows single progress bar
  • 2: Suppresses all output

The convert_to_int callback ensures proper type conversion.

Also applies to: 106-106

harpy/snakefiles/snp_freebayes.smk (1)

14-14: LGTM!

The change to use a raw string for the regex pattern ensures consistent and correct pattern matching behavior.

harpy/snakefiles/simulate_linkedreads.smk (1)

17-17: LGTM!

The change to use a raw string for the regex pattern ensures consistent and correct pattern matching behavior.

harpy/snakefiles/sv_leviathan.smk (1)

15-15: LGTM! Good use of raw string for regex pattern.

The change to use raw string notation (r"...") for the regex pattern is a good practice as it ensures proper handling of backslashes and prevents unintended escape sequence interpretations.

harpy/_cli_types_generic.py (2)

6-10: LGTM! Clean implementation of string to integer conversion.

The convert_to_int function provides a simple and robust way to convert string choices to integers while properly handling None values.


101-101: LGTM! Good UX improvement for dry-run functionality.

The removal of --dry-run and -n options with an explicit failure message directing users to the new harpy diagnose command improves user experience by providing clear guidance.

Also applies to: 106-107

harpy/snakefiles/demultiplex_gen1.smk (3)

20-22: LGTM! Improved wildcard constraints.

The changes improve pattern matching reliability by:

  • Using raw string notation for regex patterns
  • Adding specific pattern \d{3} for the part wildcard

127-127: LGTM! More accurate log file name.

The log file name change from compression to concatenation better reflects the actual operation being performed.


131-131: LGTM! Efficient shell command optimization.

The change to pipe concatenation directly to gzip is more efficient as it avoids creating intermediate files.

harpy/snakefiles/sv_naibr_phase.smk (1)

15-15: LGTM! Good use of raw string for regex pattern.

The change to use raw string notation (r"...") for the regex pattern is a good practice as it ensures proper handling of backslashes and prevents unintended escape sequence interpretations.

harpy/_launch.py (4)

19-21: LGTM! Clear documentation of quiet levels.

The comments clearly explain the different verbosity levels, making it easier for developers to understand the output behavior.


144-145: Improve progress bar text based on quiet level.

Good adaptation of the progress bar text based on the quiet level, enhancing user experience.


162-162: Effective progress bar visibility control.

The visibility of individual task progress bars is correctly controlled based on the quiet level.


217-217: Improved comment clarity for keyboard interrupt handling.

The comment has been updated to be more descriptive and accurate.

harpy/snakefiles/align_strobealign.smk (1)

14-14: LGTM! Improved regex pattern handling.

Using raw string notation (r"...") for the wildcard constraint ensures proper handling of backslashes in the regex pattern.

harpy/snp.py (3)

8-8: LGTM! Added utility function import.

The import of convert_to_int supports the new quiet option implementation.


57-57: Reasonable thread limit implementation.

Setting a maximum thread limit of 999 is a good practice to prevent resource exhaustion.

Also applies to: 159-159


61-61: Enhanced quiet option with granular control.

The quiet option now provides better control over output verbosity with three distinct levels.

Also applies to: 163-163

harpy/snakefiles/sv_naibr_pop_phase.smk (1)

15-16: LGTM! Consistent regex pattern handling.

Using raw string notation (r"...") for both sample and population wildcard constraints ensures proper handling of backslashes in the regex patterns.

harpy/sv.py (2)

64-64: LGTM! Enhanced CLI options for better control.

The changes improve the CLI by:

  • Setting a reasonable thread limit of 999
  • Providing granular control over output verbosity with three levels

Also applies to: 69-69


162-162: LGTM! Consistent CLI options.

The changes maintain consistency with the leviathan function's CLI options.

Also applies to: 168-168

harpy/snakefiles/align_ema.smk (1)

14-14: LGTM! Using raw string for regex pattern.

Using a raw string for the regex pattern prevents unintended escape sequence processing and improves readability.

harpy/align.py (4)

10-10: LGTM! Added type conversion utility.

Added convert_to_int for safe conversion of string inputs to integers.


73-73: LGTM! Enhanced BWA CLI options.

The changes improve the BWA CLI by:

  • Setting a reasonable thread limit of 999
  • Providing granular control over output verbosity with three levels

Also applies to: 79-79


163-163: LGTM! Enhanced EMA CLI options.

The changes maintain consistency with the BWA CLI options.

Also applies to: 169-169


272-272: LGTM! Enhanced Strobe CLI options.

The changes maintain consistency with the BWA and EMA CLI options.

Also applies to: 278-278

harpy/_validations.py (1)

24-519: LGTM! Added type annotations for better type safety.

The addition of type annotations improves code maintainability and helps catch type-related bugs at compile time.

🧰 Tools
🪛 Ruff (0.8.2)

60-60: f-string without any placeholders

Remove extraneous f prefix

(F541)


63-63: f-string without any placeholders

Remove extraneous f prefix

(F541)


65-65: f-string without any placeholders

Remove extraneous f prefix

(F541)


228-228: Using .strip() with multi-character strings is misleading

(B005)


247-247: Use quiet == 2 instead of not quiet != 2

Replace with == operator

(SIM202)


311-311: Do not use bare except

(E722)


320-320: Do not use bare except

(E722)


324-324: Do not use bare except

(E722)


367-367: Do not use bare except

(E722)


472-472: Loop control variable future not used within loop body

Rename unused future to _future

(B007)

harpy/simulate.py (6)

7-7: LGTM!

The addition of convert_to_int import is well-placed and supports the new verbosity level functionality.


134-134: LGTM! CLI options improved.

The changes enhance the CLI by:

  • Setting a reasonable thread limit (999) to prevent resource exhaustion
  • Adding granular verbosity control with three levels

Also applies to: 138-138


232-232: LGTM!

The verbosity control change is consistent with other commands.


355-355: LGTM!

The verbosity control change maintains consistency across commands.


458-458: LGTM!

The verbosity control change follows the established pattern.


565-565: LGTM!

The verbosity control change completes the consistent implementation across all commands.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
harpy/_validations.py (1)

247-247: 🛠️ Refactor suggestion

Simplify the boolean expression.

The expression not quiet != 2 is confusing and can be simplified.

Apply this diff to improve readability:

-    if len(overlooked) > 0 and not quiet != 2:
+    if len(overlooked) > 0 and quiet == 2:
🧰 Tools
🪛 Ruff (0.8.2)

247-247: Use quiet == 2 instead of not quiet != 2

Replace with == operator

(SIM202)

🧹 Nitpick comments (2)
harpy/diagnose.py (1)

13-15: Enhance the docstring with more details.

The current docstring could be more descriptive. Consider adding:

  • Purpose of the diagnostic command
  • Expected output
  • Example usage
-    """
-    Run the Snakemake debugger to identify hang-ups 
-    """
+    """
+    Run the Snakemake debugger to identify workflow hang-ups and issues.
+    
+    This command executes Snakemake in debug mode to visualize the DAG (Directed Acyclic Graph)
+    of jobs and identify potential issues in the workflow configuration.
+    
+    Example:
+        harpy diagnose /path/to/workflow
+    """
.github/workflows/tests.yml (1)

112-112: Comment on commented-out block with verbosity flag.
A commented-out command now shows --quiet 2 (line 112). If you plan to re-enable this step later, please confirm that it conforms to the new CLI design (switching from boolean flags to choice values for verbosity).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e8a4a0 and 6306850.

📒 Files selected for processing (4)
  • .github/workflows/tests.yml (18 hunks)
  • harpy/_validations.py (16 hunks)
  • harpy/diagnose.py (1 hunks)
  • harpy/qc.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • harpy/qc.py
🧰 Additional context used
🪛 Ruff (0.8.2)
harpy/_validations.py

247-247: Use quiet == 2 instead of not quiet != 2

Replace with == operator

(SIM202)

harpy/diagnose.py

47-47: Do not use bare except

(E722)

⏰ Context from checks skipped due to timeout of 90000ms (17)
  • GitHub Check: align strobe
  • GitHub Check: simulate variants
  • GitHub Check: simulate linkedreads
  • GitHub Check: qc
  • GitHub Check: preflight
  • GitHub Check: phase
  • GitHub Check: sv naibr
  • GitHub Check: mpileup
  • GitHub Check: sv leviathan
  • GitHub Check: impute
  • GitHub Check: freebayes
  • GitHub Check: align EMA
  • GitHub Check: demux gen1
  • GitHub Check: Rebuild Container
  • GitHub Check: metassembly
  • GitHub Check: deconvolve
  • GitHub Check: align BWA
🔇 Additional comments (41)
harpy/_validations.py (2)

24-24: LGTM! Type annotations enhance code clarity and safety.

The addition of type annotations to function signatures improves code maintainability and enables better type checking.

Also applies to: 33-33, 42-42, 143-143, 173-173, 184-184, 198-198, 222-222, 240-240, 258-258, 301-301, 380-380, 425-425, 445-445, 475-475


306-310: LGTM! Added validation for minimum window size.

The check for minimum window size prevents performance issues that could arise from too small windows.

harpy/diagnose.py (2)

16-22: LGTM!

The configuration validation is thorough with proper path handling, clear error messages, and secure file reading.


26-46: LGTM!

The process execution and output handling implementation is robust:

  • Real-time output processing
  • Proper stream encoding
  • Graceful process termination
.github/workflows/tests.yml (37)

154-154: Verify updated demultiplex command.
The demultiplex step now calls:
harpy demultiplex gen1 --quiet 2 --schema test/demux/samples.schema test/demux/Undetermined_S0_L004_R* test/demux/Undetermined_S0_L004_I*
This appears to align with the new verbosity design. Please ensure downstream components expect the numeric verbosity value.


196-196: Confirm updated preflight bam command.
The preflight bam command now includes --quiet 2:
harpy preflight bam --quiet 2 test/bam
This update is consistent with the CLI changes. Verify that the new verbosity level integrates correctly with subsequent reporting.


234-234: Updated qc command with verbosity flag.
The command:
harpy qc -x "--low_complexity_filter" --quiet 2 test/fastq
now explicitly passes the verbosity flag. This change is in line with the updated CLI type-safety for verbosity.


237-237: Review updated qc options command.
The additional options command in qc now uses --quiet 2:
harpy qc -a auto -d -c 21,40,3,0 --quiet 2 test/fastq
The adjustment appears consistent across the workflow.


274-274: Check deconvolve command update.
The deconvolve step now invokes:
harpy deconvolve --quiet 2 test/fastq
This is consistent with the updated CLI verbosity style.


311-311: Verify BWA alignment command.
The updated command:
harpy align bwa --quiet 2 -g test/genome/genome.fasta.gz -x "-A 2" test/fastq
now uses --quiet 2. Please confirm that all related parameters (like the -x option) are still interpreted as intended.


349-349: Confirm EMA alignment update.
The EMA command now uses:
harpy align ema --quiet 2 --ema-bins 150 -g test/genome/genome.fasta.gz test/fastq
This update is in keeping with the new verbosity design.


387-387: Review strobe alignment command update.
The altered command:
harpy align strobe --quiet 2 -l 125 -g test/genome/genome.fasta.gz test/fastq
includes the new verbosity flag. Verify that the length parameter (-l 125) remains accurate for expected outcomes.


425-425: Verify SNP mpileup command update.
The mpileup command now reads:
harpy snp mpileup --quiet 2 -r test/positions.bed -g test/genome/genome.fasta.gz -x "--ignore-RG" test/bam
The addition of --quiet 2 appears appropriate—please ensure that the extra filtering flag (-x "--ignore-RG") functions as expected alongside it.


428-428: Review mpileup-pop command update.
The updated command is:
harpy snp mpileup --quiet 2 -r test/positions.bed -o SNP/poptest -g test/genome/genome.fasta.gz -p test/samples.groups test/bam
This change is consistent with the intended verbosity control.


466-466: Verify Freebayes command with updated verbosity.
The command now is:
harpy snp freebayes --quiet 2 -r test/positions.bed -g test/genome/genome.fasta.gz -x "-g 200" test/bam
Ensure that the additional option (-x "-g 200") is correctly parsed with the new verbosity input.


469-469: Review Freebayes-pop command update.
The update:
harpy snp freebayes --quiet 2 -r test/positions.bed -o SNP/poptest -g test/genome/genome.fasta.gz -p test/samples.groups test/bam
appears to adopt the new verbosity consistently.


507-507: Confirm impute command update.
The impute command now includes:
harpy impute --quiet 2 --vcf test/vcf/test.bcf -p test/stitch.params test/bam
This reflects the updated CLI pattern for verbosity.


511-511: Review impute from VCF command update.
The updated command:
harpy impute --quiet 2 --vcf-samples -o vcfImpute --vcf test/vcf/test.bcf -p test/stitch.params test/bam
shows consistent application of the --quiet 2 flag.


549-549: Verify phase command update with iteration parameter.
The phase step now runs as:
harpy phase --quiet 2 --vcf test/vcf/test.bcf -x "--max_iter 10001" test/bam
Ensure that the new verbosity flag does not interfere with the maximum iteration setting.


553-553: Confirm phase with indels command.
The command:
harpy phase --quiet 2 --vcf test/vcf/test.bcf -o phaseindel -g test/genome/genome.fasta.gz test/bam
appropriately includes the verbosity parameter.


559-559: Review phase from VCF command update.
The phase command now also features:
harpy phase --quiet 2 --vcf-samples -o phasevcf --vcf test/vcf/test.bcf test/bam
The inclusion of --quiet 2 is consistent with other commands in the phase group.


597-597: Verify Leviathan SV command update.
The updated Leviathan command reads:
harpy sv leviathan --quiet 2 -s 100 -b 1 -g test/genome/genome.fasta.gz -x "-M 2002" test/bam
Please confirm that the extra option (-x "-M 2002") is correctly merged with the new verbosity semantics.


602-602: Review Leviathan-pop command update.
The command:
harpy sv leviathan --quiet 2 -s 100 -b 1 -g test/genome/genome.fasta.gz -o SV/leviathanpop -p test/samples.groups test/bam
demonstrates consistency with the new verbosity flag.


640-640: Confirm naibr command update.
The naibr command now reads:
harpy sv naibr --quiet 2 -g test/genome/genome.fasta.gz -o SV/naibr -x "-min_sv 5000" test/bam_phased && rm -r Genome
This update uniformly applies the verbosity flag. Verify that the removal of the Genome directory (rm -r Genome) is safe and intended.


644-644: Review naibr pop command update.
The updated command:
harpy sv naibr --quiet 2 -g test/genome/genome.fasta.gz -o SV/pop -p test/samples.groups test/bam_phased && rm -r Genome
ensures consistent verbosity handling while performing the pop operation.


649-649: Verify naibr with phasing command update.
The command now is:
harpy sv naibr --quiet 2 -g test/genome/genome.fasta.gz -o SV/phase -v test/vcf/test.phased.bcf test/bam && rm -r Genome
Confirm that the multiple options (including VCF input and directory cleanup) continue to function correctly.


653-653: Review naibr pop with phasing command update.
The updated command:
harpy sv naibr --quiet 2 -g test/genome/genome.fasta.gz -o SV/phasepop -v test/vcf/test.phased.bcf -p test/samples.groups test/bam && rm -r Genome
ensures the new verbosity value is passed along and parameters remain consistent.


693-693: Confirm simulation of SNP/indel variations.
The simulation command now uses:
harpy simulate snpindel --quiet 2 --snp-count 10 --indel-count 10 -z 0.5 test/genome/genome.fasta.gz
This update appears to correctly adopt the new CLI verbosity approach.


694-694: Review simulated SNP/indel command with VCF output.
The command:
harpy simulate snpindel --quiet 2 --prefix Simulate/snpvcf --snp-vcf Simulate/snpindel/haplotype_1/sim.hap1.snp.vcf --indel-vcf Simulate/snpindel/haplotype_1/sim.hap1.indel.vcf test/genome/genome.fasta.gz
includes the updated --quiet 2 option. Ensure the file paths and prefixes are correctly resolved.


699-699: Verify inversion simulation command update.
The command:
harpy simulate inversion --quiet 2 --count 10 -z 0.5 test/genome/genome.fasta.gz
now uses the new verbosity flag. Confirm that the count and ratio parameters remain valid.


700-700: Review inversion simulation with VCF output.
The updated command:
harpy simulate inversion --quiet 2 --prefix Simulate/invvcf --vcf Simulate/inversion/haplotype_1/sim.hap1.inversion.vcf test/genome/genome.fasta.gz
continues the uniform use of --quiet 2.


705-705: Confirm CNV simulation command update.
The command:
harpy simulate cnv --quiet 2 --count 10 -z 0.5 test/genome/genome.fasta.gz
correctly adopts the new verbosity parameter.


706-706: Review CNV simulation command with VCF output.
The updated command:
harpy simulate cnv --quiet 2 --prefix Simulate/cnvvcf --vcf Simulate/cnv/haplotype_1/sim.hap1.cnv.vcf test/genome/genome.fasta.gz
shows consistent parameter handling.


711-711: Verify translocation simulation command update.
The translocation simulation command now reads:
harpy simulate translocation --quiet 2 --count 10 -z 0.5 test/genome/genome.fasta.gz
The new verbosity level is applied uniformly.


712-712: Review translocation simulation with VCF output.
The command:
harpy simulate translocation --quiet 2 --prefix Simulate/transvcf --vcf Simulate/translocation/haplotype_1/sim.hap1.translocation.vcf test/genome/genome.fasta.gz
incorporates the new verbosity setting as expected.


752-752: Confirm simulate linkedreads command update.
The command now is:
harpy simulate linkedreads --quiet 2 -t 4 -n 2 -b test/haplotag.bc -l 100 -p 50 test/genome/genome.fasta.gz test/genome/genome2.fasta.gz
This update reflects the new verbosity flag and maintains multi-parameter configuration correctly.


790-790: Review assembly command update.
The test command:
harpy assembly --quiet 2 -r 4000 test/fastq/sample1.*
now uses the updated verbosity flag. Confirm that the pattern matching for fastq files is still accurate.


793-793: Verify metassembly command update.
The updated metassembly command:
harpy metassembly --quiet 2 -r 4000 test/fastq/sample1.*
appears in line with the overall CLI changes.


796-796: Review metassembly without barcodes command update.
The command:
harpy metassembly --ignore-bx --quiet 2 -r 4000 test/fastq/sample1.*
integrates the new verbosity level with the existing --ignore-bx option.


835-835: Verify downsample bam command update.
The command:
harpy downsample -d 1 --random-seed 699 --quiet 2 test/bam/sample1.bam
now uses --quiet 2 along with a fixed random seed for reproducibility.


838-838: Review downsample fastq command update.
The updated command:
harpy downsample -d 1 --quiet 2 test/fastq/sample1.*gz
consistently applies the revised verbosity setting.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
harpy/simulate_linkedreads.py (1)

4-4: Remove unused import.

The yaml module is imported but never used in this file.

-import yaml
🧰 Tools
🪛 Ruff (0.8.2)

4-4: yaml imported but unused

Remove unused import: yaml

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6306850 and 5a108fe.

📒 Files selected for processing (4)
  • .github/filters.yml (1 hunks)
  • harpy/__main__.py (3 hunks)
  • harpy/simulate_linkedreads.py (1 hunks)
  • harpy/simulate_variants.py (5 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
harpy/simulate_linkedreads.py

4-4: yaml imported but unused

Remove unused import: yaml

(F401)


77-77: Local variable configs is assigned to but never used

Remove assignment to unused variable configs

(F841)

⏰ Context from checks skipped due to timeout of 90000ms (16)
  • GitHub Check: align strobe
  • GitHub Check: simulate variants
  • GitHub Check: qc
  • GitHub Check: preflight
  • GitHub Check: phase
  • GitHub Check: sv naibr
  • GitHub Check: mpileup
  • GitHub Check: sv leviathan
  • GitHub Check: impute
  • GitHub Check: freebayes
  • GitHub Check: align EMA
  • GitHub Check: demux gen1
  • GitHub Check: align BWA
  • GitHub Check: Rebuild Container
  • GitHub Check: deconvolve
  • GitHub Check: metassembly
🔇 Additional comments (5)
harpy/__main__.py (3)

5-15: LGTM! Well-structured command organization.

The new simulate command group is well-organized with clear documentation explaining the simulation capabilities and limitations.

Also applies to: 44-74


89-96: LGTM! Logical command organization.

The CLI commands are well-organized into three logical sections, with the new diagnose command appropriately placed in the Troubleshoot section.


98-114: LGTM! Well-structured command groups.

The command groups are well-organized with consistent sorting, making the CLI structure clear and maintainable.

harpy/simulate_variants.py (1)

118-118: LGTM! Consistent implementation of quiet option.

The quiet option has been consistently updated across all simulation commands to use the new choice-based approach with convert_to_int.

Also applies to: 241-241, 344-344, 451-451

.github/filters.yml (1)

145-145: LGTM! Correct path updates.

The file paths have been correctly updated to reflect the new file structure for simulation commands.

Also applies to: 153-153

@pdimens pdimens merged commit 9e2c68d into main Feb 17, 2025
20 checks passed
@pdimens pdimens deleted the internals branch February 17, 2025 19:03
@coderabbitai coderabbitai bot mentioned this pull request Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant