Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTase-linker: Flag for methylation degree threshold #78

Open
Ge0rges opened this issue Oct 14, 2024 · 10 comments
Open

MTase-linker: Flag for methylation degree threshold #78

Ge0rges opened this issue Oct 14, 2024 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@Ge0rges
Copy link

Ge0rges commented Oct 14, 2024

Hello,

Wanted to share the following error obtained when running MTase-linker.

Traceback (most recent call last):
  File "/localdata/researchdrive/gkanaan/tools/ML_dependencies/ML_envs/06b3259e5e81fef4369da217324f5061_/lib/python3.12/site-packages/pandas/core/indexes/base
.py", line 3790, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "index.pyx", line 152, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 181, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'n_mod_bin'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/localdata/researchdrive/gkanaan/seaice_methylation/.snakemake/scripts/tmpk_q_90_l.motif_assignment.py", line 43, in <module>
    mean_methylation = nanomotif_table['n_mod_bin'] / (nanomotif_table['n_mod_bin'] + nanomotif_table['n_nomod_bin'])
                       ~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/localdata/researchdrive/gkanaan/tools/ML_dependencies/ML_envs/06b3259e5e81fef4369da217324f5061_/lib/python3.12/site-packages/pandas/core/frame.py", l
ine 3893, in __getitem__
    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/localdata/researchdrive/gkanaan/tools/ML_dependencies/ML_envs/06b3259e5e81fef4369da217324f5061_/lib/python3.12/site-packages/pandas/core/indexes/base
.py", line 3797, in get_loc
    raise KeyError(key) from err
KeyError: 'n_mod_bin'
@JSBoejer
Copy link
Collaborator

Are you using the bin-motifs.tsv file as input to MTase-linker? The motif.tsv and motif-scored.tsv does not work with the pipeline.

@Ge0rges
Copy link
Author

Ge0rges commented Oct 18, 2024

Ah yes, I am using motifs.tsv. I can switch to using bin-motifs.tsv. I am conducting this analysis within the context of a single genome so get mislead by the bin prefix.

What is the difference between those files? The output section doesn't include that information yet.

@JSBoejer
Copy link
Collaborator

JSBoejer commented Oct 21, 2024

Also, please make sure to use the newest version of Nanomotif (v. 0.1.15) as it resolves issues present in previous versions

The difference between motifs.tsv and bin_motifs.tsv lies in a series of post-processing steps applied to generate a consensus set of motifs across contigs for each bin (genome in your case). Some contigs will not have the motifs in the sequence, and other contigs might show slight variation in the motif compared to the rest of the bin due to noise or just the context in which the motifs is observed. To account for this, we apply post-processing to find consensus motifs across a whole genome and output this in bin-motifs.tsv. <

If you want to find motifs in single genome bin-motifs.tsv is your go to file. motifs.tsv and score-motifs.tsv are more relevant in regard to binning.

For more details on these post-processing steps, refer to supplementary note 1 of our preprint: https://www.biorxiv.org/content/10.1101/2024.04.29.591623v1

@Ge0rges
Copy link
Author

Ge0rges commented Oct 21, 2024

Got it. Regarding the version did you mean v0.4.15? That's indicated both on the PyPi page and the your conda meta.yaml. My installation defaulted to that. When I forced pip to install 0.1.15 I got a version of nanomotif with slightly different commands I believe including complete-workflow which I think wasn't present in the previous version I had installed.

@JSBoejer
Copy link
Collaborator

Yes, sorry about the confusion. I meant v0.4.15.

@Ge0rges
Copy link
Author

Ge0rges commented Oct 22, 2024

Getting a different error now on latest version with correct file input.

Select jobs to execute...

[Tue Oct 22 11:35:25 2024]
rule motif_assignment:
    input: nanomotif/brevundimonas_r-contigs/mtase-linker/pfam_hmm_hits/brevundimonas_r-contigs_gene_id_mod_table.tsv, nanomotif/brevundimonas_r-contigs/mtase-linker/defensefinder/brevundimonas_r-contigs_processed_defense_finder_mtase.tsv, nanomotif/brevundimonas_r-contigs/mtase-linker/blastp/brevundimonas_r-contigs_rebase_mtase_sign_alignment.tsv, nanomotif/brevundimonas_r-contigs/bin-motifs.tsv, /localdata/researchdrive/gkanaan/seaice_methylation/nanomotif/contig_bin.tsv
    output: nanomotif/brevundimonas_r-contigs/mtase-linker/mtase_assignment_table.tsv, nanomotif/brevundimonas_r-contigs/mtase-linker/nanomotif_assignment_table.tsv
    jobid: 1
    reason: Missing output files: nanomotif/brevundimonas_r-contigs/mtase-linker/mtase_assignment_table.tsv; Input files updated by another job: nanomotif/brevundimonas_r-contigs/mtase-linker/pfam_hmm_hits/brevundimonas_r-contigs_gene_id_mod_table.tsv, nanomotif/brevundimonas_r-contigs/mtase-linker/defensefinder/brevundimonas_r-contigs_processed_defense_finder_mtase.tsv, nanomotif/brevundimonas_r-contigs/mtase-linker/blastp/brevundimonas_r-contigs_rebase_mtase_sign_alignment.tsv
    resources: tmpdir=/tmp

Activating conda environment: ../../../../researchdrive/gkanaan/tools/ML_dependencies/ML_envs/71dd0a79701938f24ea6c2c3e756d4dc_
Activating conda environment: ../../../../researchdrive/gkanaan/tools/ML_dependencies/ML_envs/71dd0a79701938f24ea6c2c3e756d4dc_
Traceback (most recent call last):
  File "/localdata/researchdrive/gkanaan/seaice_methylation/.snakemake/scripts/tmpnqxvjl1f.motif_assignment.py", line 103, in <module>
    nanomotif_table_mm50.loc[:,'linked'] = False
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/researchdrive/gkanaan/tools/ML_dependencies/ML_envs/71dd0a79701938f24ea6c2c3e756d4dc_/lib/python3.12/site-packages/pandas/core/indexing.py", line 885, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/researchdrive/gkanaan/tools/ML_dependencies/ML_envs/71dd0a79701938f24ea6c2c3e756d4dc_/lib/python3.12/site-packages/pandas/core/indexing.py", line 1809, in _setitem_with_indexer
    raise ValueError(
ValueError: cannot set a frame with no defined index and a scalar
[Tue Oct 22 11:35:28 2024]
Error in rule motif_assignment:
    jobid: 1
    input: nanomotif/brevundimonas_r-contigs/mtase-linker/pfam_hmm_hits/brevundimonas_r-contigs_gene_id_mod_table.tsv, nanomotif/brevundimonas_r-contigs/mtase-linker/defensefinder/brevundimonas_r-contigs_processed_defense_finder_mtase.tsv, nanomotif/brevundimonas_r-contigs/mtase-linker/blastp/brevundimonas_r-contigs_rebase_mtase_sign_alignment.tsv, nanomotif/brevundimonas_r-contigs/bin-motifs.tsv, /localdata/researchdrive/gkanaan/seaice_methylation/nanomotif/contig_bin.tsv
    output: nanomotif/brevundimonas_r-contigs/mtase-linker/mtase_assignment_table.tsv, nanomotif/brevundimonas_r-contigs/mtase-linker/nanomotif_assignment_table.tsv
    conda-env: /researchdrive/gkanaan/tools/ML_dependencies/ML_envs/71dd0a79701938f24ea6c2c3e756d4dc_

RuleException:
CalledProcessError in file /Accounts/gkanaan/miniconda3/nanomotif/lib/python3.9/site-packages/nanomotif/mtase_linker/MTase_linker.smk, line 197:
Command 'source /Accounts/gkanaan/anaconda3/bin/activate '/researchdrive/gkanaan/tools/ML_dependencies/ML_envs/71dd0a79701938f24ea6c2c3e756d4dc_'; set -euo pipefail;  python /localdata/researchdrive/gkanaan/seaice_methylation/.snakemake/scripts/tmpnqxvjl1f.motif_assignment.py' returned non-zero exit status 1.
  File "/Accounts/gkanaan/miniconda3/nanomotif/lib/python3.9/site-packages/nanomotif/mtase_linker/MTase_linker.smk", line 197, in __rule_motif_assignment
  File "/Accounts/gkanaan/miniconda3/nanomotif/lib/python3.9/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-22T113315.257866.snakemake.log
MTase-linker failed with error: Command '['snakemake', '--snakefile', '/Accounts/gkanaan/miniconda3/nanomotif/lib/python3.9/site-packages/nanomotif/mtase_linker/MTase_linker.smk', '--cores', '20', '--config', 'THREADS=20', 'ASSEMBLY=/localdata/researchdrive/gkanaan/seaice_methylation/mags//brevundimonas_r-contigs.fna', 'CONTIG_BIN=/localdata/researchdrive/gkanaan/seaice_methylation/nanomotif/contig_bin.tsv', 'OUTPUTDIRECTORY=nanomotif/brevundimonas_r-contigs/mtase-linker', 'DEPENDENCY_PATH=/researchdrive/gkanaan/tools/ML_dependencies', 'IDENTITY=80', 'QCOVS=80', 'NANOMOTIF=nanomotif/brevundimonas_r-contigs/bin-motifs.tsv', '--use-conda', '--conda-prefix', '/researchdrive/gkanaan/tools/ML_dependencies/ML_envs']' returned non-zero exit status 1.

@JSBoejer JSBoejer reopened this Oct 23, 2024
@JSBoejer
Copy link
Collaborator

Can you provide the bin-motifs.tsv you are using?

@Ge0rges
Copy link
Author

Ge0rges commented Oct 23, 2024

Here it is:

bin     mod_type        motif   mod_position    n_mod_bin       n_nomod_bin     motif_type      motif_complement        mod_position_complement n_mod_complement    n_nomod_complement
brevundimonas_r-contigs m       GGCGCC  2       130     159     palindrome      GGCGCC  2       130     159
metagenome_assembly     m       GGCGCC  2       130     159     palindrome      GGCGCC  2       130     159

@JSBoejer
Copy link
Collaborator

JSBoejer commented Oct 28, 2024

The error arises from a filtering step in the motif assignment process. MTase-linker only assigns motifs that are methylated in more than 50% of their occurrences across the entire genome. This is defined by the formula:

n_mod_bin / (n_mod_bin + n_nomod_bin) > 0.5

From literature (Beaulaurier 2019), we know that if a methylation motif is targeted by an MTase, typically >95% of motif occurrences are methylated. This is the reason why we choose this threshold of 50%.

In your case, the two motifs have a methylation level below this threshold. As a result, MTase-linker filters these motifs out and attempts to assign an empty table, leading to the error. Thus, currently MTase-linker does not support the assignment of these motifs. Would you be interested in a configurable flag that could adjust this threshold?

It would be interesting to filter the modkit pileup for methylations related to the motif, and then make a similar plot to the ones in figure S8 of the Nanomotif article. I guess you would see something like the middle plot for figure S8.

You might also consider adjusting the --threshold_methylation_general, which determines whether a positions is seen as methylated or not.

For further details, you might find this previous discussion helpful: link to issue #60.

@Ge0rges
Copy link
Author

Ge0rges commented Oct 28, 2024

Hi @JSBoejer , that would be a good flag to have. Generating something like S8 would indeed be interesting! Thanks.

@JSBoejer JSBoejer changed the title KeyError in MTase-linker MTase-linker: Flag for methylation degree threshold Nov 4, 2024
@SorenHeidelbach SorenHeidelbach added the enhancement New feature or request label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants