Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with FIMO hits() call, index out of bounds #18

Open
Al-Murphy opened this issue Aug 27, 2024 · 0 comments
Open

Error with FIMO hits() call, index out of bounds #18

Al-Murphy opened this issue Aug 27, 2024 · 0 comments

Comments

@Al-Murphy
Copy link
Contributor

Al-Murphy commented Aug 27, 2024

Hey! Again, just wanted to say what a great resource this package is which is why I'm keen to improve it/call out any issues I find.

To this end, I came across this issue when running tangermeme.tools.fimo on differing PWMs from Jasper. Firstly, consider the example below, for two motifs from Jasper, running each separately through tangermeme's functionality:

from pyjaspar import jaspardb
import torch

#Create the JASPAR2024 release object
jdb_obj = jaspardb(release='JASPAR2024')
#Fetch motif by ID
jasp_motifs = [jdb_obj.fetch_motif_by_id('MA0634.2'),jdb_obj.fetch_motif_by_id('MA0875.2')]

alphabet=['A', 'C', 'G', 'T']
pwm_list = []
pwm_names = []
for pwm_i in jasp_motifs:
        pwm_names.append(pwm_i.matrix_id)
        pwm_alph = []
        for base in alphabet:
                pwm_alph.append(torch.tensor(pwm_i.counts[base]).unsqueeze(0))
        pwm_list.append(torch.concat(pwm_alph, dim=0).T)                
pwms = dict(zip(pwm_names,pwm_list))
#pwms['MA0634.2'].shape => torch.Size([6, 4]) (motif_length,alphabet_size), as expected

from tangermeme.tools.fimo import FIMO

from tangermeme.utils import random_one_hot

motif_len = 40
batch_size = 20

X = random_one_hot((batch_size, 4, motif_len), random_state=0)

for ind,key_i in enumerate(pwms.keys()):
    model = FIMO({key_i:pwms[key_i]})
    hits = model.hits(X, threshold=0.01)
    

This runs without error (using the same motif length as your example code) however when I change motif_len = 100 instead of 40, it fails on MA0634.2 with:

--> [312](https://vscode-remote+ssh-002dremote-002b146-002d169-002d8-002d78-002edsi-002eic-002eac-002euk.vscode-resource.vscode-cdn.net/shared/aemurphy/G-CADS/~/anaconda3/envs/g-cads/lib/python3.12/site-packages/tangermeme/tools/fimo.py:312) pval = math.pow(2, self._score_to_pval[motif_idx][score_idx])

IndexError: index 784 is out of bounds for axis 0 with size 783

I tried tracing this back and it appears to be to do with either the _pwm_to_mapping() where the smallest value or the shape of mapping is the issue or in the _score_to_pval() functionality as the index is calculated from the score as follows here:

score_idx = int(score / self.bin_size) - self._smallest[motif_idx]

To note, this error is not dependent on motif length of the sequences to be tested against, I got the same error (just with different index values) with MA1535.2. Changing the code above to this motif, we get the error:

IndexError: index 1390 is out of bounds for axis 0 with size 1390

So it appears to be the combination of the motif length of the sequences and the specific motif PWM.

Secondly, as you might imagine, this is specific to hits() and running y = model(X.float()) doesn't return any errors.

Happy to dig into this further if you have ideas of where to start?

Cheers,
Alan.

Package versions
Package                  Version
------------------------ ----------
asttokens                2.4.1
biopython                1.84
captum                   0.7.0
comm                     0.2.2
contourpy                1.2.1
cycler                   0.12.1
debugpy                  1.6.7
decorator                5.1.1
exceptiongroup           1.2.2
executing                2.0.1
filelock                 3.15.4
fonttools                4.53.1
fsspec                   2024.6.1
httplib2                 0.22.0
importlib_metadata       8.2.0
iniconfig                2.0.0
ipykernel                6.29.5
ipytest                  0.14.2
ipython                  8.26.0
jedi                     0.19.1
Jinja2                   3.1.4
joblib                   1.4.2
jupyter_client           8.6.2
jupyter_core             5.7.2
kiwisolver               1.4.5
llvmlite                 0.43.0
logomaker                0.8
MarkupSafe               2.1.5
matplotlib               3.9.2
matplotlib-inline        0.1.7
mpmath                   1.3.0
nest_asyncio             1.6.0
networkx                 3.3
numba                    0.60.0
numpy                    2.0.1
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.6.20
nvidia-nvtx-cu12         12.1.105
packaging                24.1
pandas                   2.2.2
parso                    0.8.4
pexpect                  4.9.0
pickleshare              0.7.5
pillow                   10.4.0
pip                      24.2
platformdirs             4.2.2
pluggy                   1.5.0
prompt_toolkit           3.0.47
psutil                   5.9.0
ptyprocess               0.7.0
pure_eval                0.2.3
pyBigWig                 0.3.23
pyfaidx                  0.8.1.2
Pygments                 2.18.0
pyjaspar                 3.0.0
pyJasper                 0.41
pyparsing                3.1.2
pysam                    0.22.1
pytest                   8.3.2
python-dateutil          2.9.0
pytz                     2024.1
pyzmq                    25.1.2
scikit-learn             1.5.1
scipy                    1.14.0
seaborn                  0.13.2
setuptools               72.1.0
six                      1.16.0
stack-data               0.6.2
sympy                    1.13.2
tangermeme               0.2.3
threadpoolctl            3.5.0
torch                    2.4.0
tornado                  6.4.1
tqdm                     4.66.5
traitlets                5.14.3
triton                   3.0.0
typing_extensions        4.12.2
tzdata                   2024.1
wcwidth                  0.2.13
wheel                    0.43.0
zipp                     3.20.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant