-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java.lang.NegativeArraySizeException #140
Comments
This issue is caused by a limitation of the current implementation of MS-GF+ in Java, and fixing it is not a simple nor quick change. The issue is due to overflows on array sizes, and fixing it would involve changing arrays in many places to use an array type that supports indexing with long instead of int.
We do have other tools that we use for splitting fasta files into small enough sizes, then searching the data files with each fasta file, and then merging all results for a single data file back into one mzid file.
…________________________________
From: Dominik Lux ***@***.***>
Sent: Friday, July 1, 2022 12:56:51 AM
To: MSGFPlus/msgfplus ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [MSGFPlus/msgfplus] Java.lang.NegativeArraySizeException (Issue #140)
Check twice before you click! This email originated from outside PNNL.
Describe the bug
Hello everyone, currently i am trying to index large peptide fasta files (~50 GB) for peptide searches. This fasta contains 85748938 entries of short peptides (all of them are unique). I am using the SABuild function and call it as follows:
Xmx256000M -d mouse_mzml_specific_peptides.fasta -tda 1 -decoy XXX
java -Xmx256000M -cp <PATH>/MSGFPLUS_v20220418/MSGFPlus.jar edu.ucsd.msjava.msdbsearch.BuildSA -d mouse_mzml_specific_peptides.fasta -tda 1 -decoy XXX
and getting the following Error from MSGF+:
Creating peptides.revCat.fasta.
Building suffix array: /mntc/<PATH>/work/f5/71e50c34429da341c0ad240e4f40ed/peptides.revCat.fasta
Exception in thread "main" java.lang.NegativeArraySizeException: -541141435
at edu.ucsd.msjava.msdbsearch.CompactFastaSequence.readSequence(CompactFastaSequence.java:542)
at edu.ucsd.msjava.msdbsearch.CompactFastaSequence.<init>(CompactFastaSequence.java:139)
at edu.ucsd.msjava.msdbsearch.CompactFastaSequence.<init>(CompactFastaSequence.java:89)
at edu.ucsd.msjava.msdbsearch.BuildSA.buildSAFiles(BuildSA.java:144)
at edu.ucsd.msjava.msdbsearch.BuildSA.buildSA(BuildSA.java:96)
at edu.ucsd.msjava.msdbsearch.BuildSA.main(BuildSA.java:56)
This leads to the following lines here<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMSGFPlus%2Fmsgfplus%2Fblob%2F11b6e2e5a1caac0f429a0bd3bffda6672853abae%2Fsrc%2Fmain%2Fjava%2Fedu%2Fucsd%2Fmsjava%2Fmsdbsearch%2FCompactFastaSequence.java%23L531-L552&data=05%7C01%7Cbryson.gibbons%40pnnl.gov%7C843d071d704b4d44d46108da5b374231%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637922590145055751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Yp%2BXAj4iU0vaxlxLNd15BFe%2B%2FbvOARs4zHV7pxrpsgM%3D&reserved=0>.
I was wondering if this error could be fixed quickly, since i would like to use MSGF+ for identification, even for these large fastas i am using here. Maybe it is only a simple manner of using long instead of int, because of an possible overflow happening here. But i cannot judge if other places need to be adjusted.
—
Reply to this email directly, view it on GitHub<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMSGFPlus%2Fmsgfplus%2Fissues%2F140&data=05%7C01%7Cbryson.gibbons%40pnnl.gov%7C843d071d704b4d44d46108da5b374231%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637922590145055751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vlyrr7IrmLeXgVobFCK%2FKOz91e%2BzWm%2F%2BZep%2FTAKF2%2Fs%3D&reserved=0>, or unsubscribe<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABPPX5JF6VMGHMHBJP2M5SDVR2QEHANCNFSM52L26GGQ&data=05%7C01%7Cbryson.gibbons%40pnnl.gov%7C843d071d704b4d44d46108da5b374231%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637922590145055751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WzGwdO8RggT8%2FFdE%2Bu7r4ZuE12qu3vLlYaEoVBh1zTw%3D&reserved=0>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Thanks for the quick answer! and the clarification! Yes, splitting fasta files are always an option... However, i would look forward to execute a search via a single large fasta file. If this is not a priority or not planned, then you can close this issue. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
Hello everyone, currently i am trying to index large peptide fasta files (~50 GB) for peptide searches. This fasta contains 85748938 entries of short peptides (all of them are unique). I am using the SABuild function and call it as follows:
and getting the following Error from MSGF+:
This leads to the following lines here.
I was wondering if this error could be fixed quickly, since i would like to use MSGF+ for identification, even for these large fastas i am using here. Maybe it is only a simple manner of using
long
instead ofint
, because of an possible overflow happening here. But i cannot judge if other places need to be adjusted.The text was updated successfully, but these errors were encountered: