-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: bowtie2 can now sort with samtools and picard #3518
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis pull request introduces significant updates to the Bowtie2 alignment tool's environment configuration and functionality. Key changes include modifications to the Conda environment files ( Changes
Possibly related PRs
Suggested reviewers
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
bio/bowtie2/align/environment.linux-64.pin.txt
(1 hunks)bio/bowtie2/align/environment.yaml
(1 hunks)bio/bowtie2/align/meta.yaml
(1 hunks)bio/bowtie2/align/test/Snakefile
(1 hunks)bio/bowtie2/align/wrapper.py
(3 hunks)test_wrappers.py
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
bio/bowtie2/align/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
test_wrappers.py (1)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py
18-18: Undefined name snakemake
(F821)
19-19: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
21-21: Undefined name snakemake
(F821)
22-22: Undefined name snakemake
(F821)
24-24: Undefined name snakemake
(F821)
26-26: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
29-29: Undefined name snakemake
(F821)
34-34: Undefined name snakemake
(F821)
39-39: Undefined name snakemake
(F821)
49-49: Undefined name snakemake
(F821)
49-49: Undefined name snakemake
(F821)
65-65: Undefined name snakemake
(F821)
103-103: Undefined name sort
(F821)
124-124: Undefined name snakemake
(F821)
125-125: Undefined name snakemake
(F821)
127-127: Undefined name snakemake
(F821)
128-128: Undefined name snakemake
(F821)
130-130: Undefined name snakemake
(F821)
132-132: Undefined name snakemake
(F821)
168-168: Undefined name snakemake
(F821)
🔇 Additional comments (5)
bio/bowtie2/align/environment.yaml (2)
Line range hint 1-4
: LGTM! Well-configured channels and version pinning strategy.
The channel configuration with conda-forge and bioconda, along with nodefaults, ensures reliable and reproducible package resolution. The explicit version pinning for all dependencies is a good practice for reproducibility.
7-7
: Verify picard-slim version.
Let's verify that picard-slim=3.3.0 is the latest stable version and check for any known issues.
✅ Verification successful
picard-slim version 3.3.0 is verified as the latest stable version
The verification confirms that:
- 3.3.0 is currently the latest version available in the bioconda channel
- Recent issues reported in the Picard repository are minor and don't indicate any critical problems with version 3.3.0
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check latest version and potential issues for picard-slim
# Get latest version from conda-forge
curl -s https://conda.anaconda.org/bioconda/noarch/repodata.json | jq -r '.packages | to_entries[] | select(.value.name=="picard-slim") | .value.version' | sort -V | tail -n 1
# Check for any reported issues in the past 6 months
gh api graphql -f query='
{
search(query: "repo:broadinstitute/picard is:issue created:>2024-06-01", type: ISSUE, first: 5) {
nodes {
... on Issue {
title
url
labels(first: 5) {
nodes {
name
}
}
}
}
}
}'
Length of output: 1316
test_wrappers.py (1)
2064-2079
: Test function test_bowtie2_align_samtools_coordinate_extension
is well-implemented
The new test function correctly iterates over all combinations of sorting programs, orders, and file extensions. The implementation is clean, and the function adheres to best practices for test definitions.
bio/bowtie2/align/environment.linux-64.pin.txt (2)
1-5
: LGTM! Good practice adding Conda version.
The addition of the Conda version header improves reproducibility by documenting the exact version used to create this environment.
75-89
: Verify version compatibility between tools.
The selected versions are recent and stable:
- bowtie2 2.5.4
- samtools 1.21
- picard-slim 3.3.0
Good choice using picard-slim to reduce environment size. However, let's verify compatibility between these versions.
✅ Verification successful
Tools are compatible and consistently used across the repository
Based on the environment files analysis:
- The selected versions (bowtie2 2.5.4, samtools 1.21, picard-slim 3.3.0) are consistently used together across multiple workflows in the repository
- These versions are the current standard in the codebase, showing they are well-tested together
- The Picard 3.3.0 release notes from earlier show no compatibility issues with samtools and includes htsjdk 4.1.3 which is compatible with samtools 1.21
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for known compatibility issues between the selected versions
# Check release notes and issues for compatibility information
gh api graphql -f query='
{
repository(owner: "broadinstitute", name: "picard") {
release(tagName: "3.3.0") {
description
}
}
}
'
# Check if there are any reported issues between these specific versions
gh api graphql -f query='
{
search(query: "repo:samtools/samtools repo:broadinstitute/picard is:issue samtools 1.21 picard 3.3.0", type: ISSUE) {
nodes {
... on Issue {
title
url
state
}
}
}
}
'
Length of output: 2460
Script:
#!/bin/bash
# Let's check for any compatibility requirements or issues in the wrapper script and workflow
# Check wrapper script for version requirements
fd "bowtie2.*\.py" --exec cat {}
# Check workflow for version requirements
fd "bowtie2.*\.smk|bowtie2.*\.snakefile" -I --exec cat {}
# Check if there are any version constraints in environment files
fd "environment.*\.ya?ml" --exec grep -A5 -B5 "bowtie2\|samtools\|picard" {}
# Look for any version-specific code or comments
rg -A3 -B3 "version|Version" --glob "bowtie2*"
Length of output: 8907
https://conda.anaconda.org/conda-forge/noarch/font-ttf-dejavu-sans-mono-2.37-hab24e00_0.tar.bz2#0c96522c6bdaed4b1566d11387caaf45 | ||
https://conda.anaconda.org/conda-forge/noarch/font-ttf-inconsolata-3.000-h77eed37_0.tar.bz2#34893075a5c9e55cdafac56607368fc6 | ||
https://conda.anaconda.org/conda-forge/noarch/font-ttf-source-code-pro-2.038-h77eed37_0.tar.bz2#4d59c254e01d9cde7957100457e2d5fb | ||
https://conda.anaconda.org/conda-forge/noarch/font-ttf-ubuntu-0.83-h77eed37_3.conda#49023d73832ef61042f6a237cb2687e7 | ||
https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.43-h712a8e2_2.conda#048b02e3962f066da18efe3a21b77672 | ||
https://conda.anaconda.org/conda-forge/linux-64/python_abi-3.13-5_cp313.conda#381bbd2a92c863f640a55b6ff3c35161 | ||
https://conda.anaconda.org/conda-forge/noarch/tzdata-2024b-hc8b5060_0.conda#8ac3367aafb1cc0a068483c580af8015 | ||
https://conda.anaconda.org/conda-forge/noarch/fonts-conda-forge-1-0.tar.bz2#f766549260d6815b0c52253f1fb1bb29 | ||
https://conda.anaconda.org/conda-forge/linux-64/libgomp-14.2.0-h77fa898_1.conda#cc3573974587f12dda90d96e3e55a702 | ||
https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2#73aaf86a425cc6e73fcf236a5a46396d | ||
https://conda.anaconda.org/conda-forge/linux-64/libgcc-14.1.0-h77fa898_1.conda#002ef4463dd1e2b44a94a4ace468f5d2 | ||
https://conda.anaconda.org/conda-forge/linux-64/libexpat-2.6.3-h5888daf_0.conda#59f4c43bb1b5ef1c71946ff2cbf59524 | ||
https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-14.1.0-h69a702a_1.conda#1efc0ad219877a73ef977af7dbb51f17 | ||
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-14.1.0-hc0a3c3a_1.conda#9dbb9699ea467983ba8a4ba89b08b066 | ||
https://conda.anaconda.org/conda-forge/linux-64/openssl-3.3.2-hb9d3cd8_0.conda#4d638782050ab6faa27275bed57e9b4e | ||
https://conda.anaconda.org/conda-forge/noarch/fonts-conda-ecosystem-1-0.tar.bz2#fee5683a3f04bd15cbd8318b096a27ab | ||
https://conda.anaconda.org/conda-forge/linux-64/libgcc-14.2.0-h77fa898_1.conda#3cb76c3f10d3bc7f1105b2fc9db984df | ||
https://conda.anaconda.org/conda-forge/linux-64/alsa-lib-1.2.13-hb9d3cd8_0.conda#ae1370588aa6a5157c34c73e9bbb36a0 | ||
https://conda.anaconda.org/conda-forge/linux-64/c-ares-1.34.3-hb9d3cd8_1.conda#ee228789a85f961d14567252a03e725f | ||
https://conda.anaconda.org/conda-forge/linux-64/libexpat-2.6.4-h5888daf_0.conda#db833e03127376d461e1e13e76f09b6c | ||
https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-14.2.0-h69a702a_1.conda#e39480b9ca41323497b05492a63bc35b | ||
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-14.2.0-hc0a3c3a_1.conda#234a5554c53625688d51062645337328 | ||
https://conda.anaconda.org/conda-forge/linux-64/libzlib-1.3.1-hb9d3cd8_2.conda#edb0dca6bc32e4f4789199455a1dbeb8 | ||
https://conda.anaconda.org/conda-forge/linux-64/openssl-3.4.0-hb9d3cd8_0.conda#23cc74f77eb99315c0360ec3533147a9 | ||
https://conda.anaconda.org/conda-forge/linux-64/pthread-stubs-0.4-hb9d3cd8_1002.conda#b3c17d95b5a10c6e64a21fa17573e70e | ||
https://conda.anaconda.org/conda-forge/linux-64/xorg-libice-1.1.1-hb9d3cd8_1.conda#19608a9656912805b2b9a2f6bd257b04 | ||
https://conda.anaconda.org/conda-forge/linux-64/xorg-libxau-1.0.11-hb9d3cd8_1.conda#77cbc488235ebbaab2b6e912d3934bae | ||
https://conda.anaconda.org/conda-forge/linux-64/xorg-libxdmcp-1.1.5-hb9d3cd8_0.conda#8035c64cb77ed555e3f150b7b3972480 | ||
https://conda.anaconda.org/conda-forge/linux-64/xorg-xorgproto-2024.1-hb9d3cd8_1.conda#7c21106b851ec72c037b162c216d8f05 | ||
https://conda.anaconda.org/conda-forge/linux-64/bzip2-1.0.8-h4bc722e_7.conda#62ee74e96c5ebb0af99386de58cf9553 | ||
https://conda.anaconda.org/conda-forge/linux-64/c-ares-1.33.1-heb4867d_0.conda#0d3c60291342c0c025db231353376dfb | ||
https://conda.anaconda.org/conda-forge/linux-64/giflib-5.2.2-hd590300_0.conda#3bf7b9fd5a7136126e0234db4b87c8b6 | ||
https://conda.anaconda.org/conda-forge/linux-64/keyutils-1.6.1-h166bdaf_0.tar.bz2#30186d27e2c9fa62b45fb1476b7200e3 | ||
https://conda.anaconda.org/conda-forge/linux-64/libdeflate-1.20-hd590300_0.conda#8e88f9389f1165d7c0936fe40d9a9a79 | ||
https://conda.anaconda.org/conda-forge/linux-64/libdeflate-1.21-h4bc722e_0.conda#36ce76665bf67f5aac36be7a0d21b7f3 | ||
https://conda.anaconda.org/conda-forge/linux-64/libev-4.33-hd590300_2.conda#172bf1cd1ff8629f2b1179945ed45055 | ||
https://conda.anaconda.org/conda-forge/linux-64/libffi-3.4.2-h7f98852_5.tar.bz2#d645c6d2ac96843a2bfaccd2d62b3ac3 | ||
https://conda.anaconda.org/conda-forge/linux-64/libnsl-2.0.1-hd590300_0.conda#30fd6e37fe21f86f4bd26d6ee73eeec7 | ||
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-14.1.0-h4852527_1.conda#bd2598399a70bb86d8218e95548d735e | ||
https://conda.anaconda.org/conda-forge/linux-64/libiconv-1.17-hd590300_2.conda#d66573916ffcf376178462f1b61c941e | ||
https://conda.anaconda.org/conda-forge/linux-64/libjpeg-turbo-3.0.0-hd590300_1.conda#ea25936bb4080d843790b586850f82b8 | ||
https://conda.anaconda.org/conda-forge/linux-64/libmpdec-4.0.0-h4bc722e_0.conda#aeb98fdeb2e8f25d43ef71fbacbeec80 | ||
https://conda.anaconda.org/conda-forge/linux-64/libpng-1.6.44-hadc24fc_0.conda#f4cc49d7aa68316213e4b12be35308d1 | ||
https://conda.anaconda.org/conda-forge/linux-64/libsqlite-3.47.0-hadc24fc_1.conda#b6f02b52a174e612e89548f4663ce56a | ||
https://conda.anaconda.org/conda-forge/linux-64/libssh2-1.11.1-hf672d98_0.conda#be2de152d8073ef1c01b7728475f2fe7 | ||
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-14.2.0-h4852527_1.conda#8371ac6457591af2cf6159439c1fd051 | ||
https://conda.anaconda.org/conda-forge/linux-64/libuuid-2.38.1-h0b41bf4_0.conda#40b61aab5c7ba9ff276c41cfffe6b80b | ||
https://conda.anaconda.org/conda-forge/linux-64/libwebp-base-1.4.0-hd590300_0.conda#b26e8aa824079e1be0294e7152ca4559 | ||
https://conda.anaconda.org/conda-forge/linux-64/libxcb-1.17.0-h8a09558_0.conda#92ed62436b625154323d40d5f2f11dd7 | ||
https://conda.anaconda.org/conda-forge/linux-64/libxcrypt-4.4.36-hd590300_1.conda#5aa797f8787fe7a17d1b0821485b5adc | ||
https://conda.anaconda.org/conda-forge/linux-64/libzlib-1.3.1-h4ab18f5_1.conda#57d7dc60e9325e3de37ff8dffd18e814 | ||
https://conda.anaconda.org/conda-forge/linux-64/ncurses-6.5-he02047a_1.conda#70caf8bb6cf39a0b6b7efc885f51c0fe | ||
https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_h4845f30_101.conda#d453b98d9c83e71da0741bb0ff4d76bc | ||
https://conda.anaconda.org/conda-forge/linux-64/xz-5.2.6-h166bdaf_0.tar.bz2#2161070d867d1b1204ea749c8eec4ef0 | ||
https://conda.anaconda.org/conda-forge/linux-64/zlib-1.3.1-hb9d3cd8_2.conda#c9f075ab2f33b3bbee9e62d4ad0a6cd8 | ||
https://conda.anaconda.org/conda-forge/linux-64/freetype-2.12.1-h267a509_2.conda#9ae35c3d96db2c94ce0cef86efdfa2cb | ||
https://conda.anaconda.org/conda-forge/linux-64/graphite2-1.3.13-h59595ed_1003.conda#f87c7b7c2cb45f323ffbce941c78ab7c | ||
https://conda.anaconda.org/conda-forge/linux-64/icu-75.1-he02047a_0.conda#8b189310083baabfb622af68fd9d3ae3 | ||
https://conda.anaconda.org/conda-forge/linux-64/lerc-4.0.0-h27087fc_0.tar.bz2#76bbff344f0134279f225174e9064c8f | ||
https://conda.anaconda.org/conda-forge/linux-64/libedit-3.1.20191231-he28a2e2_2.tar.bz2#4d331e44109e3f0e19b4cb8f9b82f3e1 | ||
https://conda.anaconda.org/conda-forge/linux-64/libnghttp2-1.58.0-h47da74e_1.conda#700ac6ea6d53d5510591c4344d5c989a | ||
https://conda.anaconda.org/conda-forge/linux-64/libsqlite-3.46.1-hadc24fc_0.conda#36f79405ab16bf271edb55b213836dac | ||
https://conda.anaconda.org/conda-forge/linux-64/libssh2-1.11.0-h0841786_0.conda#1f5a58e686b13bcfde88b93f547d23fe | ||
https://conda.anaconda.org/conda-forge/linux-64/libnghttp2-1.64.0-h161d5f1_0.conda#19e57602824042dfd0446292ef90488b | ||
https://conda.anaconda.org/conda-forge/linux-64/pcre2-10.44-hba22ea6_2.conda#df359c09c41cd186fffb93a2d87aa6f5 | ||
https://conda.anaconda.org/conda-forge/linux-64/perl-5.32.1-7_hd590300_perl5.conda#f2cfec9406850991f4e3d960cc9e3321 | ||
https://conda.anaconda.org/conda-forge/linux-64/pixman-0.43.2-h59595ed_0.conda#71004cbf7924e19c02746ccde9fd7123 | ||
https://conda.anaconda.org/conda-forge/linux-64/readline-8.2-h8228510_1.conda#47d31b792659ce70f470b5c82fdfb7a4 | ||
https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_h4845f30_101.conda#d453b98d9c83e71da0741bb0ff4d76bc | ||
https://conda.anaconda.org/conda-forge/linux-64/xorg-libsm-1.2.4-he73a12e_1.conda#05a8ea5f446de33006171a7afe6ae857 | ||
https://conda.anaconda.org/conda-forge/linux-64/xorg-libx11-1.8.10-h4f16b4b_0.conda#0b666058a179b744a622d0a4a0c56353 | ||
https://conda.anaconda.org/conda-forge/linux-64/zstd-1.5.6-ha6fb4c9_0.conda#4d056880988120e29d75bfff282e0f45 | ||
https://conda.anaconda.org/conda-forge/linux-64/fontconfig-2.15.0-h7e30c49_1.conda#8f5b0b297b59e1ac160ad4beec99dbee | ||
https://conda.anaconda.org/conda-forge/linux-64/krb5-1.21.3-h659f571_0.conda#3f43953b7d3fb3aaa1d0d0723d91e368 | ||
https://conda.anaconda.org/conda-forge/linux-64/python-3.12.5-h2ad013b_0_cpython.conda#9c56c4df45f6571b13111d8df2448692 | ||
https://conda.anaconda.org/conda-forge/linux-64/libglib-2.82.2-h2ff4ddf_0.conda#13e8e54035ddd2b91875ba399f0f7c04 | ||
https://conda.anaconda.org/conda-forge/linux-64/libtiff-4.7.0-h6565414_0.conda#80eaf80d84668fa5620ac9ec1b4bf56f | ||
https://conda.anaconda.org/conda-forge/linux-64/python-3.13.0-h9ebbce0_101_cp313.conda#f4fea9d5bb3f2e61a39950a7ab70ee4e | ||
https://conda.anaconda.org/conda-forge/linux-64/xorg-libxext-1.3.6-hb9d3cd8_0.conda#febbab7d15033c913d53c7a2c102309d | ||
https://conda.anaconda.org/conda-forge/linux-64/xorg-libxfixes-6.0.1-hb9d3cd8_0.conda#4bdb303603e9821baf5fe5fdff1dc8f8 | ||
https://conda.anaconda.org/conda-forge/linux-64/xorg-libxrender-0.9.11-hb9d3cd8_1.conda#a7a49a8b85122b49214798321e2e96b4 | ||
https://conda.anaconda.org/conda-forge/linux-64/xorg-libxt-1.3.1-hb9d3cd8_0.conda#279b0de5f6ba95457190a1c459a64e31 | ||
https://conda.anaconda.org/bioconda/linux-64/bowtie2-2.5.4-h7071971_4.tar.bz2#69822858766e6c8b12ae90d78d54d8ea |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider optimizing dependencies for CLI usage.
The environment includes several GUI-related packages that may be unnecessary for a command-line bioinformatics tool:
- Font packages (dejavu, inconsolata, ubuntu, etc.)
- X11 libraries (xorg-*)
- Graphics libraries (cairo, pixman)
These dependencies could increase the environment size and installation time without providing value for the core functionality.
Consider removing these packages if they're not direct requirements for Bowtie2, Samtools, or Picard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (6)
bio/bowtie2/align/environment.linux-64.pin.txt (1)
16-55
: Consider optimizing system library dependencies.The environment includes a comprehensive set of system libraries. While these are likely pulled in as dependencies, consider:
- Evaluating if all these libraries are necessary for the core functionality
- Testing if any can be marked as optional dependencies
bio/bowtie2/align/wrapper.py (2)
42-45
: Fix typo in error messageThere's a typo in the error message: "declarad" should be "declared".
- f"Missing required indices: {missing_idx} declarad as input.idx.\n" + f"Missing required indices: {missing_idx} declared as input.idx.\n"
171-179
: Consider resource management optimizationThe wrapper combines alignment and sorting in a single step using pipes. While this is memory-efficient, it could benefit from some resource management considerations:
- For large datasets, consider adding a parameter to control the memory usage of sorting tools (e.g.,
--memory
for samtools,MAX_RECORDS_IN_RAM
for Picard)- Consider adding progress reporting for long-running operations
Would you like help implementing these resource management optimizations?
bio/bowtie2/align/meta.yaml (1)
Line range hint
29-31
: Enhance documentation for new sorting parametersThe notes section should document the new sorting functionality and parameters.
Add documentation for the new parameters:
* The `extra` param allows for additional arguments for bowtie2. + * The `sort_program` param enables output sorting using either 'samtools' or 'picard'. Default is 'none'. + * The `sort_extra` param allows passing additional arguments to the selected sorting program. + * The `sort_order` param specifies the sorting order: 'coordinate' (default) or 'queryname'. * This wrapper uses an inner pipe. Make sure to use at least two threads in your Snakefile.bio/bowtie2/align/test/Snakefile (2)
140-140
: Consider dynamic thread allocation for sortingThe current fixed thread count might not be optimal for all sorting operations. Consider adjusting threads based on the sorting program.
- threads: 8 # Use at least two threads + threads: lambda wildcards, input: 16 if wildcards.program != "none" else 8 # Increase threads for sorting operations
138-139
: Enhance log file structureConsider organizing log files in subdirectories based on sorting configuration for better maintainability.
- log: - f"logs/bowtie2/{{sample}}.{program}_{order}_{extension}.log", + log: + f"logs/bowtie2/{program}/{order}/{{sample}}.{extension}.log",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
bio/bowtie2/align/environment.linux-64.pin.txt
(1 hunks)bio/bowtie2/align/environment.yaml
(1 hunks)bio/bowtie2/align/meta.yaml
(2 hunks)bio/bowtie2/align/test/Snakefile
(1 hunks)bio/bowtie2/align/wrapper.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- bio/bowtie2/align/environment.yaml
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py
18-18: Undefined name snakemake
(F821)
19-19: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
21-21: Undefined name snakemake
(F821)
22-22: Undefined name snakemake
(F821)
24-24: Undefined name snakemake
(F821)
26-26: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
29-29: Undefined name snakemake
(F821)
34-34: Undefined name snakemake
(F821)
39-39: Undefined name snakemake
(F821)
49-49: Undefined name snakemake
(F821)
49-49: Undefined name snakemake
(F821)
65-65: Undefined name snakemake
(F821)
124-124: Undefined name snakemake
(F821)
125-125: Undefined name snakemake
(F821)
127-127: Undefined name snakemake
(F821)
128-128: Undefined name snakemake
(F821)
130-130: Undefined name snakemake
(F821)
132-132: Undefined name snakemake
(F821)
168-168: Undefined name snakemake
(F821)
🔇 Additional comments (7)
bio/bowtie2/align/environment.linux-64.pin.txt (2)
8-11
: Consider optimizing dependencies for CLI usage.
The environment includes several GUI-related packages that may be unnecessary for a command-line bioinformatics tool:
- Font packages (dejavu, inconsolata, ubuntu, etc.)
- X11 libraries (xorg-*)
- Graphics libraries (cairo, pixman)
These dependencies could increase the environment size and installation time without providing value for the core functionality.
Also applies to: 56-78
79-79
: Verify compatibility between bioinformatics tools.
The core bioinformatics tools are present with appropriate versions:
- bowtie2 2.5.4
- samtools 1.21
- picard-slim 3.3.0
Please verify that these specific versions are compatible with each other for the sorting functionality, particularly when using Picard for query name sorting.
Consider documenting the version compatibility matrix in the wrapper's documentation to help users understand which versions work well together.
Also applies to: 89-89, 92-92, 93-93
✅ Verification successful
Based on the search results, I can now generate a final response:
The specified versions of bowtie2, samtools, and picard are compatible and commonly used together
The codebase shows extensive integration testing and usage of these tools together:
- Multiple test files demonstrate bowtie2 2.5.x working with samtools 1.x and picard 3.x
- No compatibility issues or errors are documented between these versions
- The tools are frequently used together in various workflows and pipelines
- The version combinations are actively tested in the CI pipeline
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify compatibility between tool versions
# Check if there are any known compatibility issues between these versions
gh api graphql -f query='
{
search(query: "repo:broadinstitute/picard is:issue \"bowtie2 2.5.4\" OR \"samtools 1.21\"", type: ISSUE) {
nodes {
... on Issue {
title
url
state
}
}
}
}'
# Check if there are any reported issues with these specific versions
rg -l "compatibility|issue|error" | xargs rg -A 5 "bowtie2.*2\.5\.4|samtools.*1\.21|picard.*3\.3\.0"
Length of output: 636
Script:
#!/bin/bash
# Let's try a different approach to verify compatibility
# 1. Check for any configuration files or workflow files that might show these tools working together
fd -e yaml -e yml -e config -e json | xargs rg -l "bowtie2|samtools|picard"
# 2. Look for any documented version constraints or requirements
fd "README|CHANGELOG|requirements" | xargs rg -i "version|compatibility|require"
# 3. Check for any test files that might use these tools together
fd -e test -e spec | xargs rg -l "bowtie2|samtools|picard"
# 4. Look for workflow definitions that might show tool interactions
fd "workflow|pipeline|snakefile|nextflow" -i | xargs rg -l "bowtie2|samtools|picard"
# 5. Check if there are any documented issues in issue templates or docs
fd "ISSUE_TEMPLATE|docs" -i | xargs rg -i "compatibility|version"
Length of output: 34600
bio/bowtie2/align/wrapper.py (2)
103-103
:
Fix undefined variable sort
in error message
The error message uses undefined variable sort
instead of sort_program
.
- raise ValueError(f"Unexpected value for params.sort ({sort})")
+ raise ValueError(f"Unexpected value for params.sort_program ({sort_program})")
86-91
:
Clarify thread allocation error message for Picard
The error message for Picard thread allocation is unclear. It should explain that Picard requires 2 threads minimum - one for bowtie2 and one for Picard.
bowtie2_threads = bowtie2_threads - 1
if bowtie2_threads <= 0:
raise ValueError(
- "Not enough threads requested. This wrapper requires exactly one more."
+ "Not enough threads requested. When using Picard, at least 2 threads are required (1 for bowtie2, 1 for Picard)."
)
bio/bowtie2/align/meta.yaml (2)
8-8
: LGTM: Author addition
Addition of Jorge Langa as an author is appropriate given the contributions.
25-27
: LGTM: New sorting parameters
The new parameters align well with the PR objectives:
sort_program
: Enables sorting with samtools or picardsort_extra
: Allows customization of sorting behaviorsort_order
: Provides coordinate/queryname sorting options
bio/bowtie2/align/test/Snakefile (1)
106-109
: LGTM: Comprehensive dynamic rule generation
The nested loops effectively generate rules for all possible combinations of sorting configurations.
Verify the generated rule combinations:
✅ Verification successful
LGTM: Dynamic rule generation is correctly implemented
The nested loops generate 18 distinct rules (2 orders × 3 extensions × 3 programs) with proper:
- Naming convention:
bowtie2_align_{program}_{order}_{extension}
- Required inputs including reference genome for CRAM output
- Appropriate parameters for sorting configuration
- Structured log file naming that reflects the configuration
The implementation ensures comprehensive coverage of all sorting configurations while maintaining consistent rule structure.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if all expected rule combinations are generated
# List all rules in the Snakefile
snakemake --list-rules 2>/dev/null | grep "bowtie2_align_" | sort
Length of output: 53
Script:
#!/bin/bash
# Let's examine the Snakefile content directly to verify the rule generation
cat bio/bowtie2/align/test/Snakefile
Length of output: 4074
Script:
#!/bin/bash
# Search for rule definitions in the Snakefile
rg "rule bowtie2_align_" bio/bowtie2/align/test/Snakefile -A 5
Length of output: 64
Script:
#!/bin/bash
# Check the full context around the loops
rg -B 5 -A 10 'for order in \["coordinate", "queryname"\]' bio/bowtie2/align/test/Snakefile
Length of output: 614
bio/bowtie2/align/test/Snakefile
Outdated
ref="genome.fasta", | ||
output: | ||
f"mapped_idx/{{sample}}.{program}_{order}.{extension}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Based on the analysis of both wrapper.py and the test Snakefile, I can now provide the final response:
CRAM output handling needs improvement
The wrapper doesn't explicitly validate the presence of the reference genome for CRAM output. While the test Snakefile correctly includes ref="genome.fasta"
in CRAM-related rules, the wrapper.py should:
- Validate reference presence when output is CRAM format
- Properly pass the reference to samtools for CRAM generation
Specific issues:
- In wrapper.py, line 54-55: The CRAM conversion lacks
-T
reference parameter in the samtools view command - No validation check for reference input when CRAM output is detected
🔗 Analysis chain
Verify CRAM output handling
The reference genome is required for CRAM output. Ensure proper error handling when ref is missing.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if wrapper.py handles missing reference for CRAM output
# Search for CRAM-related error handling
rg -A 5 "\.cram" bio/bowtie2/align/wrapper.py
Length of output: 47
Script:
#!/bin/bash
# Let's check the wrapper.py content to understand how output formats are handled
cat bio/bowtie2/align/wrapper.py
Length of output: 5582
Script:
#!/bin/bash
# Let's check if there's any reference handling for CRAM output in the Snakefile
cat bio/bowtie2/align/test/Snakefile
Length of output: 4074
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
bio/bowtie2/align/wrapper.py (1)
41-47
: Fix typo in error messageThere's a typo in the error message: "declarad" should be "declared".
- f"Missing required indices: {missing_idx} declarad as input.idx.\n" + f"Missing required indices: {missing_idx} declared as input.idx.\n"🧰 Tools
🪛 Ruff (0.8.0)
41-41: Undefined name
snakemake
(F821)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
bio/bowtie2/align/wrapper.py
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py
18-18: Undefined name snakemake
(F821)
19-19: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
21-21: Undefined name snakemake
(F821)
22-22: Undefined name snakemake
(F821)
24-24: Undefined name snakemake
(F821)
26-26: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
31-31: Undefined name snakemake
(F821)
36-36: Undefined name snakemake
(F821)
41-41: Undefined name snakemake
(F821)
51-51: Undefined name snakemake
(F821)
51-51: Undefined name snakemake
(F821)
67-67: Undefined name snakemake
(F821)
126-126: Undefined name snakemake
(F821)
127-127: Undefined name snakemake
(F821)
129-129: Undefined name snakemake
(F821)
130-130: Undefined name snakemake
(F821)
132-132: Undefined name snakemake
(F821)
134-134: Undefined name snakemake
(F821)
170-170: Undefined name snakemake
(F821)
🔇 Additional comments (6)
bio/bowtie2/align/wrapper.py (6)
1-12
: LGTM: Appropriate imports for the new functionality
The imports are well-organized and include necessary modules for the new sorting capabilities.
13-32
: LGTM: Well-structured initialization with proper thread validation
The initialization code properly defines required indices and handles thread allocation with appropriate validation.
🧰 Tools
🪛 Ruff (0.8.0)
18-18: Undefined name snakemake
(F821)
19-19: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
21-21: Undefined name snakemake
(F821)
22-22: Undefined name snakemake
(F821)
24-24: Undefined name snakemake
(F821)
26-26: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
31-31: Undefined name snakemake
(F821)
50-59
: LGTM: Comprehensive input validation
The input validation properly checks for valid sample counts and sort order values.
🧰 Tools
🪛 Ruff (0.8.0)
51-51: Undefined name snakemake
(F821)
51-51: Undefined name snakemake
(F821)
88-93
: Clarify Picard thread requirement message
The error message doesn't clearly explain that Picard requires at least 2 threads total, with 1 thread reserved for Picard.
- "Not enough threads requested. This wrapper requires exactly one more."
+ "Not enough threads requested. Picard sorting requires at least 2 threads (1 for Bowtie2, 1 for Picard)."
104-105
: Fix variable name in error message
The error message uses sort
instead of sort_program
.
- raise ValueError(f"Unexpected value for params.sort ({sort_program})")
+ raise ValueError(f"Unexpected value for params.sort_program ({sort_program})")
173-181
: LGTM: Clean command construction with proper temporary directory handling
The execution logic properly handles temporary directories and constructs the command with all necessary parameters.
bio/bowtie2/align/environment.yaml
Outdated
- bowtie2 =2.5.4 | ||
- picard-slim =3.3.0 | ||
- bowtie2 =2.5 | ||
- picard-slim =3.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- picard-slim =3.3 | |
- picard-slim =3.3.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
bio/bowtie2/align/wrapper.py (4)
14-23
: Add edge case handling to get_extension functionThe function should handle edge cases such as empty filenames or files without extensions to prevent potential runtime errors.
Consider this improved implementation:
def get_extension(filename: str) -> str: """ Return file format since Bowtie2 reads files that could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). """ + if not filename: + raise ValueError("Empty filename") + parts = filename.split(".") + if len(parts) < 2: + raise ValueError(f"No extension found in filename: {filename}") if filename.endswith((".gz", ".bz2")): - return filename.split(".")[-2].lower() + return parts[-2].lower() if len(parts) > 2 else "" - return filename.split(".")[-1].lower() + return parts[-1].lower()
45-47
: Fix typo in error messageThere's a typo in the error message: "declarad" should be "declared".
- f"Missing required indices: {missing_idx} declarad as input.idx.\n" + f"Missing required indices: {missing_idx} declared as input.idx.\n"
155-183
: Consider using constants for repeated string literalsThe code uses string literals like "none", "samtools", "picard" multiple times. Consider defining these as constants at the module level to improve maintainability.
+# Constants for sort programs +SORT_PROGRAM_NONE = "none" +SORT_PROGRAM_SAMTOOLS = "samtools" +SORT_PROGRAM_PICARD = "picard" + match sort_program: - case "none": + case SORT_PROGRAM_NONE: # ... - case "samtools": + case SORT_PROGRAM_SAMTOOLS: # ... - case "picard": + case SORT_PROGRAM_PICARD: # ...
111-116
: Consider extracting CRAM validation to a separate functionThe CRAM validation logic could be moved to a separate function to improve code organization and reusability.
+def validate_cram_requirements(ref: str | None, ref_fai: str | None) -> None: + """Validate that reference files are provided for CRAM output.""" + if ref is None or ref_fai is None: + raise ValueError( + "Reference file and index are required for CRAM output. " + "Please specify them as input.ref and input.ref_fai." + ) + if bam_extension == "cram": - if REF is None or REF_FAI is None: - raise ValueError( - "Reference file and index are required for CRAM output." - "Please specify them as input.ref and input.ref_fai." - ) + validate_cram_requirements(REF, REF_FAI)🧰 Tools
🪛 Ruff (0.8.0)
111-112: Use a single
if
statement instead of nestedif
statementsCombine
if
statements usingand
(SIM102)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
bio/bowtie2/align/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py
26-26: Undefined name snakemake
(F821)
36-36: Undefined name snakemake
(F821)
41-41: Undefined name snakemake
(F821)
51-51: Undefined name snakemake
(F821)
54-54: Undefined name snakemake
(F821)
58-58: Undefined name snakemake
(F821)
59-59: Undefined name snakemake
(F821)
60-60: Undefined name snakemake
(F821)
61-61: Undefined name snakemake
(F821)
62-62: Undefined name snakemake
(F821)
63-63: Undefined name snakemake
(F821)
69-69: Undefined name snakemake
(F821)
72-72: Undefined name snakemake
(F821)
73-73: Undefined name snakemake
(F821)
79-79: Undefined name snakemake
(F821)
80-80: Undefined name snakemake
(F821)
81-81: Undefined name snakemake
(F821)
82-82: Undefined name snakemake
(F821)
83-83: Undefined name snakemake
(F821)
85-85: Undefined name snakemake
(F821)
87-87: Undefined name snakemake
(F821)
111-112: Use a single if
statement instead of nested if
statements
Combine if
statements using and
(SIM102)
124-127: Use ternary operator CMD_INPUT = f"--interleaved {SAMPLE}" if interleaved else f"-U {SAMPLE}"
instead of if
-else
-block
Replace if
-else
-block with CMD_INPUT = f"--interleaved {SAMPLE}" if interleaved else f"-U {SAMPLE}"
(SIM108)
197-197: Undefined name snakemake
(F821)
🔇 Additional comments (2)
bio/bowtie2/align/wrapper.py (2)
1-13
: LGTM! Imports are appropriate for the new functionality
The imports are well-organized and include all necessary modules for the enhanced sorting capabilities.
72-74
: Ensure proper thread allocation for sorting
The current implementation could result in 0 threads for sorting if snakemake.threads
is 1.
🧰 Tools
🪛 Ruff (0.8.0)
72-72: Undefined name snakemake
(F821)
73-73: Undefined name snakemake
(F821)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
bio/bowtie2/align/test/Snakefile (2)
107-144
: Consider validating program and order combinationsThe current implementation allows all combinations of
program
andorder
, but some might not make sense (e.g., program="none" with order="coordinate"/"queryname"). Consider:
- Adding documentation about valid combinations
- Implementing validation logic to skip invalid combinations
Example improvement:
# Define valid combinations VALID_COMBINATIONS = { "none": [""], # no sorting "samtools": ["coordinate", "queryname"], "picard": ["coordinate", "queryname"] } for program, valid_orders in VALID_COMBINATIONS.items(): for order in valid_orders: for extension in ["sam", "bam", "cram"]: # Rule definition here
129-134
: Clean up commented output parametersThe commented output parameters appear to be unused. If they're not needed, consider removing them for better code clarity.
output: f"mapped_idx/{{sample}}.{program}_{order}.{extension}", - # idx="", - # metrics="", - # unaligned="", - # unpaired="", - # unconcordant="", - # concordant="",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
bio/bowtie2/align/test/Snakefile
(2 hunks)bio/bowtie2/align/test/genome.fasta.fai
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- bio/bowtie2/align/test/genome.fasta.fai
🔇 Additional comments (2)
bio/bowtie2/align/test/Snakefile (2)
89-89
: LGTM: Proper CRAM reference handling
The addition of the FASTA index requirement (ref_fai
) is correct and necessary for CRAM output handling. This change properly addresses the previous review feedback about CRAM reference validation.
107-109
: LGTM: Comprehensive implementation of sorting options
The nested loop structure effectively generates test rules for all combinations of sorting programs (none/samtools/picard), orders (coordinate/queryname), and output formats (sam/bam/cram).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
bio/bowtie2/align/wrapper.py (3)
15-24
: Consider handling additional edge cases in get_extensionWhile the function works correctly for standard cases, consider handling these edge cases:
- Files without extensions
- Files with multiple dots in the name
- Case-insensitive extension comparison
def get_extension(filename: str) -> str: """ Return file format since Bowtie2 reads files that could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). """ + if not filename or "." not in filename: + raise ValueError("Invalid filename: must have an extension") if filename.endswith((".gz", ".bz2")): - return filename.split(".")[-2].lower() + parts = filename.lower().split(".") + return parts[-2] if len(parts) > 2 else "" - return filename.split(".")[-1].lower() + return filename.lower().split(".")[-1]
174-176
: Enhance CRAM output validationWhen using CRAM output with samtools, consider validating that the reference file exists and is readable.
if bam_extension == "cram": + if not path.isfile(REF): + raise ValueError(f"Reference file not found or not readable: {REF}") samtools_opts += f" --reference {REF} "
205-210
: Consider using a list for command constructionThe command construction could be more maintainable by using a list of command parts.
- shell( - "( bowtie2" - " --threads {bowtie2_threads}" - " {CMD_INPUT} " - " -x {index}" - " {extra}" - " " + PIPE_CMD + ") {LOG}" - ) + cmd_parts = [ + "(", + "bowtie2", + f"--threads {bowtie2_threads}", + CMD_INPUT, + f"-x {index}", + extra, + PIPE_CMD, + f") {LOG}" + ] + shell(" ".join(cmd_parts))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
bio/bowtie2/align/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
🪛 Ruff (0.8.0)
bio/bowtie2/align/wrapper.py
27-27: Undefined name snakemake
(F821)
40-40: Undefined name snakemake
(F821)
45-45: Undefined name snakemake
(F821)
55-55: Undefined name snakemake
(F821)
58-58: Undefined name snakemake
(F821)
62-62: Undefined name snakemake
(F821)
63-63: Undefined name snakemake
(F821)
64-64: Undefined name snakemake
(F821)
65-65: Undefined name snakemake
(F821)
66-66: Undefined name snakemake
(F821)
67-67: Undefined name snakemake
(F821)
73-73: Undefined name snakemake
(F821)
76-76: Undefined name snakemake
(F821)
77-77: Undefined name snakemake
(F821)
81-81: Undefined name snakemake
(F821)
82-82: Undefined name snakemake
(F821)
83-83: Undefined name snakemake
(F821)
84-84: Undefined name snakemake
(F821)
85-85: Undefined name snakemake
(F821)
87-87: Undefined name snakemake
(F821)
89-89: Undefined name snakemake
(F821)
126-129: Use ternary operator CMD_INPUT = f"--interleaved {SAMPLE}" if interleaved else f"-U {SAMPLE}"
instead of if
-else
-block
Replace if
-else
-block with CMD_INPUT = f"--interleaved {SAMPLE}" if interleaved else f"-U {SAMPLE}"
(SIM108)
201-201: Undefined name snakemake
(F821)
🔇 Additional comments (3)
bio/bowtie2/align/wrapper.py (3)
1-14
: LGTM! Imports and header are well-organized.
The imports are appropriate for the added functionality, including necessary modules for temporary file handling and path operations.
49-50
: Fix typo in error message
The word "declarad" should be "declared" in the error message.
- f"Missing required indices: {missing_idx} declarad as input.idx.\n"
+ f"Missing required indices: {missing_idx} declared as input.idx.\n"
76-78
: Add early thread validation
The thread validation should occur before thread allocation to prevent negative thread counts.
+if snakemake.threads < 2:
+ raise ValueError("This wrapper requires at least 2 threads")
bowtie2_threads = snakemake.threads
sort_threads = snakemake.threads - 1
🧰 Tools
🪛 Ruff (0.8.0)
76-76: Undefined name snakemake
(F821)
77-77: Undefined name snakemake
(F821)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
bio/bowtie2/align/test/Snakefile (2)
136-136
: Consider making threads configurableThe hardcoded thread count of 8 might not be optimal for all environments. Consider making this configurable through the config file or workflow resources.
- threads: 8 # Use at least two threads + threads: config.get("bowtie2_threads", 8) # Configurable with default
132-132
: Document sort_extra parameter usageThe
sort_extra
parameter is included but empty. Please add a comment documenting its purpose and expected values for different sorting programs (samtools/picard).- sort_extra="", + sort_extra="", # Additional arguments for sort program (e.g., "-m 4G" for samtools, "MAX_RECORDS_IN_RAM=1000000" for Picard)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
bio/bowtie2/align/test/Snakefile
(2 hunks)
🔇 Additional comments (1)
bio/bowtie2/align/test/Snakefile (1)
107-109
: LGTM! Well-structured dynamic rule generation
The nested loop structure efficiently generates all necessary combinations of sorting orders, output formats, and processing programs. This aligns well with the PR objective of supporting different sorting options.
bio/bowtie2/align/test/Snakefile
Outdated
ref="genome.fasta", | ||
ref_fai="genome.fasta.fai", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Conditionally include reference inputs
The reference inputs (ref
and ref_fai
) are currently always required, but they're only needed for CRAM output. Consider making these conditional based on the output format.
# Example approach using a lambda function in input
input:
sample=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
idx=multiext(...),
**lambda wildcards: {
"ref": "genome.fasta",
"ref_fai": "genome.fasta.fai"
} if extension == "cram" else {}
@@ -86,6 +86,7 @@ | |||
".rev.2.bt2", | |||
), | |||
ref="genome.fasta", | |||
ref_fai="genome.fasta.fai", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Make the reference index path configurable
The hardcoded path "genome.fasta.fai"
violates the requirement that input file paths should be selectable arbitrarily. Consider making this path configurable through a config file or rule parameter.
- ref_fai="genome.fasta.fai",
+ ref_fai=config["ref_fai"], # or another configurable source
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
bio/bowtie2/align/wrapper.py (1)
46-48
:⚠️ Potential issueImprove thread allocation logic
The current thread allocation might lead to issues when
snakemake.threads
is 1.Add validation before thread allocation:
THREADS = snakemake.threads +if THREADS < 2: + raise ValueError("This wrapper requires at least 2 threads") sort_threads = snakemake.threads - 1🧰 Tools
🪛 Ruff (0.8.2)
46-46: Undefined name
snakemake
(F821)
47-47: Undefined name
snakemake
(F821)
🧹 Nitpick comments (6)
bio/bowtie2/align/meta.yaml (1)
17-21
: Consider documenting optional outputs in notes sectionThe commented optional output paths could be moved to the notes section to maintain cleaner YAML structure while preserving the documentation of available outputs.
output: - SAM/BAM/CRAM file. This must be the first output file in the list. - idx: Optional path to bam index. - # - metrics: Optional path to metrics file. - # - unaligned: Optional path to unaligned unpaired reads. - # - unpaired: Optional path to unpaired reads that aligned at least once. - # - unconcordant: Optional path to pairs that didn't align concordantly. - # - concordant: Optional path to pairs that aligned concordantly at least once. notes: | * The `extra` param allows for additional arguments for bowtie2. * This wrapper uses an inner pipe. Make sure to use at least two threads in your Snakefile. + * Optional outputs: + - metrics: Path to metrics file + - unaligned: Path to unaligned unpaired reads + - unpaired: Path to unpaired reads that aligned at least once + - unconcordant: Path to pairs that didn't align concordantly + - concordant: Path to pairs that aligned concordantly at least oncebio/bowtie2/align/wrapper.py (5)
16-23
: Add input validation to get_extensionWhile the function handles basic cases well, it could benefit from input validation to handle edge cases.
Consider this enhanced implementation:
def get_extension(filename: str) -> str: + if not filename: + raise ValueError("Filename cannot be empty") if filename.endswith((".gz", ".bz2")): return filename.split(".")[-2].lower() return filename.split(".")[-1].lower()
65-66
: Enhance error message for input validationThe error message could be more descriptive to help users understand the expected format.
Consider this improvement:
if not isinstance(SAMPLE, str) and len(SAMPLE) not in [1, 2]: - raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements") + raise ValueError( + f"Input must have 1 (single-end) or 2 (paired-end) elements, got {len(SAMPLE)} elements" + )
106-110
: Improve error messages for sort parametersThe error messages for sort parameters could be more user-friendly.
Consider these improvements:
if SORT_ORDER not in {"coordinate", "queryname"}: raise ValueError( - f"Unexpected value for sort_order ({SORT_ORDER})" - "Valid values are 'coordinate' or 'queryname'" + f"Invalid sort_order '{SORT_ORDER}'. Valid values are: 'coordinate' or 'queryname'" ) if SORT_PROGRAM not in {"none", "samtools", "picard"}: raise ValueError( - f"Unexpected value for sort_program ({SORT_PROGRAM})" - "Valid values are 'none', 'samtools' or 'picard'" + f"Invalid sort_program '{SORT_PROGRAM}'. Valid values are: 'none', 'samtools' or 'picard'" )Also applies to: 112-116
150-153
: Simplify conditional logic using ternary operatorThe conditional logic for interleaved input can be simplified.
- if IS_INTERLEAVED: - cmd_input = f"--interleaved {SAMPLE}" - else: - cmd_input = f"-U {SAMPLE}" + cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}"🧰 Tools
🪛 Ruff (0.8.2)
150-153: Use ternary operator
cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}"
instead ofif
-else
-blockReplace
if
-else
-block withcmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}"
(SIM108)
194-194
: Remove unnecessary f-string prefixesSome strings are marked as f-strings but don't contain any placeholders.
- SAMTOOLS_OPTS += f"--write-index " + SAMTOOLS_OPTS += "--write-index " - PICARD_OPTS += f"--CREATE_INDEX true " + PICARD_OPTS += "--CREATE_INDEX true "Also applies to: 214-214
🧰 Tools
🪛 Ruff (0.8.2)
194-194: f-string without any placeholders
Remove extraneous
f
prefix(F541)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
bio/bowtie2/align/meta.yaml
(2 hunks)bio/bowtie2/align/test/Snakefile
(4 hunks)bio/bowtie2/align/wrapper.py
(1 hunks)test_wrappers.py
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
test_wrappers.py (1)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
bio/bowtie2/align/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py
27-27: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
29-29: Undefined name snakemake
(F821)
30-30: Undefined name snakemake
(F821)
33-33: Undefined name snakemake
(F821)
39-39: Undefined name snakemake
(F821)
43-43: Undefined name snakemake
(F821)
46-46: Undefined name snakemake
(F821)
47-47: Undefined name snakemake
(F821)
51-51: Undefined name snakemake
(F821)
52-52: Undefined name snakemake
(F821)
53-53: Undefined name snakemake
(F821)
54-54: Undefined name snakemake
(F821)
55-55: Undefined name snakemake
(F821)
57-57: Undefined name snakemake
(F821)
59-59: Undefined name snakemake
(F821)
70-70: Undefined name snakemake
(F821)
76-76: Undefined name snakemake
(F821)
150-153: Use ternary operator cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}"
instead of if
-else
-block
Replace if
-else
-block with cmd_input = f"--interleaved {SAMPLE}" if IS_INTERLEAVED else f"-U {SAMPLE}"
(SIM108)
194-194: f-string without any placeholders
Remove extraneous f
prefix
(F541)
214-214: f-string without any placeholders
Remove extraneous f
prefix
(F541)
🔇 Additional comments (8)
bio/bowtie2/align/meta.yaml (1)
25-27
: LGTM! Well-structured sorting parameters
The new sorting parameters are well-organized and provide good flexibility:
sort_program
: Clear options for sorting implementationsort_extra
: Allows customization of sorting behaviorsort_order
: Standard options for sort ordering
bio/bowtie2/align/test/Snakefile (2)
Line range hint 1-21
: LGTM! Well-structured base rule
The base rule test_bowtie2_sam
is well-implemented with:
- Clear input/output paths
- Proper thread specification (>= 2 as required)
- Consistent log file naming
75-90
: LGTM! Well-designed sorting test rules
The sorting test rules are properly implemented:
- Inherit from base rule using 'use rule as'
- Consistent output path structure
- Clear parameter specification for sort programs
test_wrappers.py (1)
2045-2070
: LGTM! Comprehensive test coverage
The bowtie2 alignment tests provide good coverage:
- Tests all output formats (SAM/BAM/CRAM)
- Tests both sorting programs (samtools/picard)
- Consistent with Snakefile output paths
- Proper core allocation
bio/bowtie2/align/wrapper.py (4)
1-13
: LGTM: Header and imports are appropriate
The imports are well-organized and include all necessary modules for the enhanced functionality.
85-99
: LGTM: Output validation is thorough
The validation of output file extensions and index compatibility is comprehensive and well-implemented.
125-139
: LGTM: Compatibility checks are comprehensive
The validation of CRAM output requirements and index file compatibility is thorough and well-implemented.
234-242
: LGTM: Command execution is well-implemented
The use of a temporary directory and command construction is clean and efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
bio/bowtie2/align/wrapper.py (3)
34-38
: Consider removing commented-out output variables.These commented-out variables (
METRICS
,UNALIGNED
, etc.) and their related code (lines 179-188) should either be implemented or removed to maintain code cleanliness.Would you like me to help clean up these commented sections?
197-198
: Simplify thread check condition.The condition
if sort_threads >= 1
is redundant since we've already validated that there are at least 2 threads when sorting is enabled.- if sort_threads >= 1: - SAMTOOLS_OPTS += f"--threads {sort_threads} " + SAMTOOLS_OPTS += f"--threads {sort_threads} "
204-205
: Make SORT_EXTRA modification more explicit.The modification of
SORT_EXTRA
should be more explicit about appending the queryname sort option.- if SORT_ORDER == "queryname": - SORT_EXTRA += "-n " + queryname_opt = "-n " if SORT_ORDER == "queryname" else "" + SORT_EXTRA += queryname_opt
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
bio/bowtie2/align/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py
27-27: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
29-29: Undefined name snakemake
(F821)
30-30: Undefined name snakemake
(F821)
33-33: Undefined name snakemake
(F821)
39-39: Undefined name snakemake
(F821)
43-43: Undefined name snakemake
(F821)
46-46: Undefined name snakemake
(F821)
50-50: Undefined name snakemake
(F821)
51-51: Undefined name snakemake
(F821)
52-52: Undefined name snakemake
(F821)
53-53: Undefined name snakemake
(F821)
54-54: Undefined name snakemake
(F821)
56-56: Undefined name snakemake
(F821)
58-58: Undefined name snakemake
(F821)
72-72: Undefined name snakemake
(F821)
78-78: Undefined name snakemake
(F821)
165-165: Undefined name snakemake
(F821)
🔇 Additional comments (3)
bio/bowtie2/align/wrapper.py (3)
16-23
: LGTM! Well-documented helper function.
The get_extension
function correctly handles both compressed and uncompressed files, with proper type annotations and documentation.
242-249
: LGTM! Proper temporary directory handling.
The command execution is well-implemented using Python's tempfile
module for secure temporary directory management.
52-54
: Verify sorting functionality across different scenarios.
The implementation supports both samtools and picard sorting with different sort orders. Let's verify this functionality:
Also applies to: 194-239
✅ Verification successful
Based on the search results, I can now generate the final response as I have sufficient information to verify the sorting functionality. The results show that:
- The sorting implementation in bowtie2/align is consistent with other bioinformatics tools in the codebase (bwa, bwa-mem2, dragmap, etc.)
- The sort parameters (sort_program, sort_order, sort_extra) are used consistently across the codebase
- There are test cases in bio/bowtie2/align/test/Snakefile that verify both samtools and picard sorting
- The sort_order parameter correctly handles both "coordinate" and "queryname" options
- The implementation properly handles extra arguments through sort_extra parameter
Sorting functionality implementation is verified and consistent with codebase patterns
- The implementation follows the established pattern used across multiple bioinformatics tools in the codebase
- Test cases exist for both samtools and picard sorting implementations
- The sort parameters are handled consistently with other similar wrappers
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify sorting implementation and its usage across the codebase
# Check for other wrappers using similar sorting logic
echo "Checking for similar sorting implementations:"
rg -A 5 'sort_program.*=.*("samtools"|"picard")'
# Check for test cases covering sorting functionality
echo "Checking for sort-related test cases:"
rg -A 10 'sort_program.*=.*(samtools|picard)' test/
# Check for potential issues with sort order handling
echo "Checking sort order handling:"
rg -A 5 'sort_order.*=.*("coordinate"|"queryname")'
Length of output: 25780
🧰 Tools
🪛 Ruff (0.8.2)
52-52: Undefined name snakemake
(F821)
53-53: Undefined name snakemake
(F821)
54-54: Undefined name snakemake
(F821)
bio/bowtie2/align/wrapper.py
Outdated
if THREADS == 1 and SORT_PROGRAM != "none": | ||
raise ValueError( | ||
"Not enough threads requested. This wrapper requires at least two threads: " | ||
"one for bowtie2 and one for samtools/picard." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consolidate duplicate thread validation logic.
There are two identical thread validation blocks. Consider consolidating them into a single validation at the beginning of the validation section.
Here's a suggested consolidation:
- if THREADS == 1 and SORT_PROGRAM != "none":
- raise ValueError(
- "Not enough threads requested. This wrapper requires at least two threads: "
- "one for bowtie2 and one for samtools/picard."
- )
# ... other validations ...
- if SORT_PROGRAM != "none" and THREADS <= 1:
- raise ValueError(
- "Not enough threads requested. This wrapper requires at least two threads: "
- "one for bowtie2 and one for samtools/picard."
- )
+ # Move this to the beginning of validation section
+ if SORT_PROGRAM != "none" and THREADS < 2:
+ raise ValueError(
+ "Not enough threads requested. This wrapper requires at least two threads: "
+ "one for bowtie2 and one for samtools/picard."
+ )
Also applies to: 127-131
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
bio/bowtie2/align/wrapper.py (4)
16-23
: Add input validation to get_extension functionThe function should validate that the input filename is not empty and is a string.
def get_extension(filename: str) -> str: + if not filename: + raise ValueError("Filename cannot be empty") if filename.endswith((".gz", ".bz2")): return filename.split(".")[-2].lower() return filename.split(".")[-1].lower()
34-38
: Clean up or document commented codeSeveral output variables are commented out. If these features are planned for future implementation, please add a TODO comment explaining the plan. If they are no longer needed, consider removing them.
108-112
: Improve error message formattingThe error message for invalid sort_order is missing proper punctuation.
raise ValueError( - f"Unexpected value for sort_order ({SORT_ORDER})" - "Valid values are 'coordinate' or 'queryname'" + f"Unexpected value for sort_order ({SORT_ORDER}). " + "Valid values are 'coordinate' or 'queryname'." )
187-235
: Add error handling for invalid sort program combinationsWhile the validation checks prevent invalid combinations, it would be good practice to add an explicit error case in the pattern matching.
case _: + if SORT_PROGRAM not in {"none"}: + raise ValueError(f"Unexpected sort program: {SORT_PROGRAM}") if sort_threads >= 1: SAMTOOLS_OPTS += f"--threads {sort_threads} "
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
bio/bowtie2/align/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py
27-27: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
29-29: Undefined name snakemake
(F821)
30-30: Undefined name snakemake
(F821)
33-33: Undefined name snakemake
(F821)
39-39: Undefined name snakemake
(F821)
43-43: Undefined name snakemake
(F821)
46-46: Undefined name snakemake
(F821)
50-50: Undefined name snakemake
(F821)
51-51: Undefined name snakemake
(F821)
52-52: Undefined name snakemake
(F821)
53-53: Undefined name snakemake
(F821)
54-54: Undefined name snakemake
(F821)
56-56: Undefined name snakemake
(F821)
58-58: Undefined name snakemake
(F821)
72-72: Undefined name snakemake
(F821)
78-78: Undefined name snakemake
(F821)
158-158: Undefined name snakemake
(F821)
🔇 Additional comments (1)
bio/bowtie2/align/wrapper.py (1)
239-246
: LGTM! Clean implementation of command execution
The implementation properly manages temporary directories and handles command execution with appropriate logging.
@fgvieira I think I am done with this wrapper. I've left commented the part about outputting extra files from bowtie2 (metrics, unaligned, unpaired, concordant and unconcordant). Some of them introduce a lot of complexity in the code because bowtie can output 1 or 2 fastq files depending on if the input is SE or PE. These files can be easily obtained with Let me know if you want me to keep that functionality on to uncomment it. |
It seems you still have un-pushed commits. Can you push them? |
I don't understand. All my commits are on the |
I clicked the Update Branch button. I hope it was that. |
On the |
I see that we both removed the patch number, but in a comment later you appended the one for
Is that correct? That one has to be fixed to 3.3.0? |
Ok. I see other wrappers with picard-slim =3.3.0. Fixing that and repinning everything. |
but also |
Ok. Pinned everything with the exception of samtools. The pin file stays the same after that. |
Why the total re-write of the wrapper? As I see it, you only needed to add the logic for snakemake-wrappers/bio/bwa/mem/wrapper.py Lines 43 to 66 in 7b6aa5c
and change the line:
|
My initial plan was to just do either samtools view or sort, but I saw the wrapper from bwa-mem2 and I saw that it could be done for sam, bam and cram, sorted and unsorted, and with samtools and picard. But bwa-mem2 only aligns and generates the sam file. Bowtie2 can produce extra outputs. And now you brought to my attention And there are wrappers specific to a single sort program like bwa-mem2 and samblaster What is the official view from snakemake wrappers? A monolithic wrapper htat handles multiple tool combinations, or multiple silly wrappers? If we go through the silly wrappers, we can delete 100+ lines of code just for validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
bio/bowtie2/align/wrapper.py (5)
16-23
: Enhance extension handling robustnessThe extension handling could be improved to be more resilient:
- Case-insensitive extension comparison
- Handle paths with multiple dots
Consider this improved implementation:
def get_extension(filename: str) -> str: """ Return file format since Bowtie2 reads files that could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). """ - if filename.endswith((".gz", ".bz2")): - return filename.split(".")[-2].lower() - return filename.split(".")[-1].lower() + filename = filename.lower() + base, ext = path.splitext(filename) + if ext in ('.gz', '.bz2'): + return path.splitext(base)[1][1:] # Remove leading dot + return ext[1:] # Remove leading dot
34-38
: Document rationale for commented-out output variablesThese commented-out variables suggest planned functionality for metrics and different alignment outputs. Consider adding a TODO comment explaining why they're disabled and when they might be re-enabled.
Add a comment explaining the rationale:
+ # TODO: These outputs are temporarily disabled due to complexity with SE/PE handling + # They can be re-enabled once we implement proper SE/PE output handling # METRICS = snakemake.output.get("metrics", None) # UNALIGNED = snakemake.output.get("unaligned", None) # UNPAIRED = snakemake.output.get("unpaired", None) # UNCONCORDANT = snakemake.output.get("unconcordant", None) # CONCORDANT = snakemake.output.get("concordant", None)
64-68
: Enhance error message with actual input detailsThe error message could be more helpful by including the actual input received.
if not isinstance(SAMPLE, str) and len(SAMPLE) not in [1, 2]: raise ValueError( "Input must have 1 (single-end) or 2 (paired-end) elements, " - f"got {len(SAMPLE)} elements" + f"got {len(SAMPLE)} elements: {SAMPLE}" )
91-101
: Standardize error message formattingThe error messages for output validation should follow a consistent format.
if bam_extension.lower() not in {"sam", "bam", "cram"}: raise ValueError( - f"Unrecognized extension for output file: {bam_extension}." - "Valid extensions are 'sam', 'bam' or 'cram'" + f"Unrecognized extension for output file: {bam_extension}. " + "Valid extensions are: 'sam', 'bam' or 'cram'" ) if bai_extension not in {None, "bai", "crai"}: raise ValueError( - f"Unrecognized extension for index file: {bai_extension}." - "Valid extensions are 'bai' or 'crai'" + f"Unrecognized extension for index file: {bai_extension}. " + "Valid extensions are: 'bai' or 'crai'" )
200-206
: Use consistent string formatting styleThe command string uses inconsistent formatting styles. Consider using f-strings consistently.
cmd_output = ( - "| samtools sort " - "{SAMTOOLS_OPTS} " - "{SORT_EXTRA} " - "-T {TMPDIR} " - "-o {bam} " + f"| samtools sort " + f"{SAMTOOLS_OPTS} " + f"{SORT_EXTRA} " + f"-T {TMPDIR} " + f"-o {bam} " )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
bio/bowtie2/align/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py
27-27: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
29-29: Undefined name snakemake
(F821)
30-30: Undefined name snakemake
(F821)
33-33: Undefined name snakemake
(F821)
39-39: Undefined name snakemake
(F821)
43-43: Undefined name snakemake
(F821)
46-46: Undefined name snakemake
(F821)
50-50: Undefined name snakemake
(F821)
51-51: Undefined name snakemake
(F821)
52-52: Undefined name snakemake
(F821)
53-53: Undefined name snakemake
(F821)
54-54: Undefined name snakemake
(F821)
56-56: Undefined name snakemake
(F821)
58-58: Undefined name snakemake
(F821)
72-72: Undefined name snakemake
(F821)
78-78: Undefined name snakemake
(F821)
158-158: Undefined name snakemake
(F821)
🔇 Additional comments (1)
bio/bowtie2/align/wrapper.py (1)
247-254
: LGTM! Proper temporary directory and shell execution handling
The implementation correctly uses a temporary directory and properly handles shell command execution with logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
bio/bowtie2/align/wrapper.py (4)
16-25
: Add input validation to get_extensionThe function should validate that the input is not None or empty to prevent potential runtime errors.
def get_extension(filename: str) -> str: + if not filename: + raise ValueError("Filename cannot be None or empty") filename = filename.lower() base, ext = path.splitext(filename)
36-42
: Consider documenting the plan for re-enabling outputsThe TODO comment indicates that several outputs are temporarily disabled due to SE/PE handling complexity. Consider:
- Creating a tracking issue for re-enabling these outputs
- Documenting the specific challenges with SE/PE handling
- Adding acceptance criteria for when these can be re-enabled
Would you like me to help create a GitHub issue to track this enhancement?
112-116
: Improve error message formattingThe error message for invalid sort order is missing a space between the message parts.
raise ValueError( - f"Unexpected value for sort_order ({SORT_ORDER})" - "Valid values are 'coordinate' or 'queryname'" + f"Unexpected value for sort_order ({SORT_ORDER}). " + "Valid values are 'coordinate' or 'queryname'" )
193-210
: Consider extracting samtools command constructionThe samtools command construction logic is complex and could benefit from being extracted into a helper function for better maintainability.
+def build_samtools_sort_command(bam, bai, sort_order, ref, sort_threads): + opts = get_samtools_opts(snakemake, parse_threads=False, param_name="sort_extra") + opts += f" --threads {sort_threads} " + if bai: + opts += "--write-index " + if sort_order == "queryname": + opts += "-n " + if get_extension(bam) == "cram": + opts += f"--reference {ref} --output-fmt CRAM " + return ( + "| samtools sort " + f"{opts} " + "{SORT_EXTRA} " + "-T {TMPDIR} " + f"-o {bam} " + ) match SORT_PROGRAM: case "samtools": - SAMTOOLS_OPTS += f"--threads {sort_threads} " - if BAI: - bam = f"{BAM}##idx##{BAI}" - SAMTOOLS_OPTS += "--write-index " - else: - bam = BAM - if SORT_ORDER == "queryname": - SORT_EXTRA += "-n " - if bam_extension == "cram": - SAMTOOLS_OPTS += f"--reference {REF} --output-fmt CRAM " - cmd_output = ( - "| samtools sort " - "{SAMTOOLS_OPTS} " - "{SORT_EXTRA} " - "-T {TMPDIR} " - "-o {bam} " - ) + output_bam = f"{BAM}##idx##{BAI}" if BAI else BAM + cmd_output = build_samtools_sort_command( + output_bam, BAI, SORT_ORDER, REF, sort_threads + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
bio/bowtie2/align/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
bio/bowtie2/align/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
🪛 Ruff (0.8.2)
bio/bowtie2/align/wrapper.py
29-29: Undefined name snakemake
(F821)
30-30: Undefined name snakemake
(F821)
31-31: Undefined name snakemake
(F821)
32-32: Undefined name snakemake
(F821)
35-35: Undefined name snakemake
(F821)
43-43: Undefined name snakemake
(F821)
47-47: Undefined name snakemake
(F821)
50-50: Undefined name snakemake
(F821)
54-54: Undefined name snakemake
(F821)
55-55: Undefined name snakemake
(F821)
56-56: Undefined name snakemake
(F821)
57-57: Undefined name snakemake
(F821)
58-58: Undefined name snakemake
(F821)
60-60: Undefined name snakemake
(F821)
62-62: Undefined name snakemake
(F821)
76-76: Undefined name snakemake
(F821)
82-82: Undefined name snakemake
(F821)
162-162: Undefined name snakemake
(F821)
🔇 Additional comments (3)
bio/bowtie2/align/wrapper.py (3)
124-127
: Consolidate thread validation logic
This thread validation appears to be duplicated. Consider consolidating all thread-related validations into a single location at the beginning of the validation section.
251-258
: LGTM! Good use of temporary directory
The implementation properly handles temporary files and includes appropriate logging.
161-162
: Verify thread allocation between bowtie2 and sort programs
The thread allocation between bowtie2 and sort programs needs verification to ensure optimal performance.
✅ Verification successful
Based on the code analysis, I can now provide a final response regarding the thread allocation verification:
Thread allocation between bowtie2 and sort is properly implemented
The wrapper implements a sensible thread allocation strategy:
- When sorting is enabled, it reserves 1 thread for sorting (
sort_threads = snakemake.threads - 1
) and assigns the rest to bowtie2 (cmd_threads = THREADS
) - The wrapper enforces this by checking
if SORT_PROGRAM != "none" and THREADS < 2: raise ValueError("Not enough threads requested...")
- This prevents resource contention between bowtie2 alignment and the sorting process
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if other wrappers in the codebase have similar thread allocation patterns
# This helps validate our approach to thread distribution
# Look for thread allocation patterns in other wrappers
rg -A 5 "threads\s*=\s*snakemake\.threads\s*-\s*[0-9]+" "bio/**/wrapper.py"
# Look for bowtie2 thread parameter usage
rg -A 5 "--threads.*bowtie2" "bio/**/wrapper.py"
Length of output: 632
Script:
#!/bin/bash
# Let's try a different approach to find the files and analyze thread allocation
# First, let's find the wrapper.py file
fd "wrapper.py" bio/
# Then look for thread-related patterns in the specific file
fd "wrapper.py" bio/ --exec rg -l "threads"
# Get the content of the specific wrapper file to analyze thread allocation
fd "wrapper.py" bio/bowtie2/align --exec cat {}
# Look for any configuration or documentation files that might specify threading behavior
fd "meta.yaml|README.md" bio/bowtie2/align --exec cat {}
Length of output: 27634
🧰 Tools
🪛 Ruff (0.8.2)
162-162: Undefined name snakemake
(F821)
If it is just different sorting programs, I'd say a single wrapper. |
This PR shamelessly copies functionalities from bwa-mem2 to sort or not, do it by coordinate or queryname, and choose between samtools and picard to do so
QC
snakemake-wrappers
.While the contributions guidelines are more extensive, please particularly ensure that:
test.py
was updated to call any added or updated example rules in aSnakefile
input:
andoutput:
file paths in the rules can be chosen arbitrarilyinput:
oroutput:
)tempfile.gettempdir()
points tometa.yaml
contains a link to the documentation of the respective tool or command underurl:
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests