Java.lang.NegativeArraySizeException #140

Luxxii · 2022-07-01T07:56:39Z

Describe the bug
Hello everyone, currently i am trying to index large peptide fasta files (~50 GB) for peptide searches. This fasta contains 85748938 entries of short peptides (all of them are unique). I am using the SABuild function and call it as follows:

java -Xmx256000M -cp <PATH>/MSGFPLUS_v20220418/MSGFPlus.jar edu.ucsd.msjava.msdbsearch.BuildSA -d peptides.fasta -tda 1 -decoy XXX

and getting the following Error from MSGF+:

Creating peptides.revCat.fasta.
Building suffix array: /mntc/<PATH>/work/f5/71e50c34429da341c0ad240e4f40ed/peptides.revCat.fasta
Exception in thread "main" java.lang.NegativeArraySizeException: -541141435
        at edu.ucsd.msjava.msdbsearch.CompactFastaSequence.readSequence(CompactFastaSequence.java:542)
        at edu.ucsd.msjava.msdbsearch.CompactFastaSequence.<init>(CompactFastaSequence.java:139)
        at edu.ucsd.msjava.msdbsearch.CompactFastaSequence.<init>(CompactFastaSequence.java:89)
        at edu.ucsd.msjava.msdbsearch.BuildSA.buildSAFiles(BuildSA.java:144)
        at edu.ucsd.msjava.msdbsearch.BuildSA.buildSA(BuildSA.java:96)
        at edu.ucsd.msjava.msdbsearch.BuildSA.main(BuildSA.java:56)

This leads to the following lines here.

I was wondering if this error could be fixed quickly, since i would like to use MSGF+ for identification, even for these large fastas i am using here. Maybe it is only a simple manner of using long instead of int, because of an possible overflow happening here. But i cannot judge if other places need to be adjusted.

The text was updated successfully, but these errors were encountered:

FarmGeek4Life · 2022-07-01T08:09:13Z

This issue is caused by a limitation of the current implementation of MS-GF+ in Java, and fixing it is not a simple nor quick change. The issue is due to overflows on array sizes, and fixing it would involve changing arrays in many places to use an array type that supports indexing with long instead of int. We do have other tools that we use for splitting fasta files into small enough sizes, then searching the data files with each fasta file, and then merging all results for a single data file back into one mzid file.

…

________________________________ From: Dominik Lux ***@***.***> Sent: Friday, July 1, 2022 12:56:51 AM To: MSGFPlus/msgfplus ***@***.***> Cc: Subscribed ***@***.***> Subject: [MSGFPlus/msgfplus] Java.lang.NegativeArraySizeException (Issue #140) Check twice before you click! This email originated from outside PNNL. Describe the bug Hello everyone, currently i am trying to index large peptide fasta files (~50 GB) for peptide searches. This fasta contains 85748938 entries of short peptides (all of them are unique). I am using the SABuild function and call it as follows: Xmx256000M -d mouse_mzml_specific_peptides.fasta -tda 1 -decoy XXX java -Xmx256000M -cp <PATH>/MSGFPLUS_v20220418/MSGFPlus.jar edu.ucsd.msjava.msdbsearch.BuildSA -d mouse_mzml_specific_peptides.fasta -tda 1 -decoy XXX and getting the following Error from MSGF+: Creating peptides.revCat.fasta. Building suffix array: /mntc/<PATH>/work/f5/71e50c34429da341c0ad240e4f40ed/peptides.revCat.fasta Exception in thread "main" java.lang.NegativeArraySizeException: -541141435 at edu.ucsd.msjava.msdbsearch.CompactFastaSequence.readSequence(CompactFastaSequence.java:542) at edu.ucsd.msjava.msdbsearch.CompactFastaSequence.<init>(CompactFastaSequence.java:139) at edu.ucsd.msjava.msdbsearch.CompactFastaSequence.<init>(CompactFastaSequence.java:89) at edu.ucsd.msjava.msdbsearch.BuildSA.buildSAFiles(BuildSA.java:144) at edu.ucsd.msjava.msdbsearch.BuildSA.buildSA(BuildSA.java:96) at edu.ucsd.msjava.msdbsearch.BuildSA.main(BuildSA.java:56) This leads to the following lines here<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMSGFPlus%2Fmsgfplus%2Fblob%2F11b6e2e5a1caac0f429a0bd3bffda6672853abae%2Fsrc%2Fmain%2Fjava%2Fedu%2Fucsd%2Fmsjava%2Fmsdbsearch%2FCompactFastaSequence.java%23L531-L552&data=05%7C01%7Cbryson.gibbons%40pnnl.gov%7C843d071d704b4d44d46108da5b374231%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637922590145055751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Yp%2BXAj4iU0vaxlxLNd15BFe%2B%2FbvOARs4zHV7pxrpsgM%3D&reserved=0>. I was wondering if this error could be fixed quickly, since i would like to use MSGF+ for identification, even for these large fastas i am using here. Maybe it is only a simple manner of using long instead of int, because of an possible overflow happening here. But i cannot judge if other places need to be adjusted. — Reply to this email directly, view it on GitHub<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMSGFPlus%2Fmsgfplus%2Fissues%2F140&data=05%7C01%7Cbryson.gibbons%40pnnl.gov%7C843d071d704b4d44d46108da5b374231%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637922590145055751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vlyrr7IrmLeXgVobFCK%2FKOz91e%2BzWm%2F%2BZep%2FTAKF2%2Fs%3D&reserved=0>, or unsubscribe<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABPPX5JF6VMGHMHBJP2M5SDVR2QEHANCNFSM52L26GGQ&data=05%7C01%7Cbryson.gibbons%40pnnl.gov%7C843d071d704b4d44d46108da5b374231%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637922590145055751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WzGwdO8RggT8%2FFdE%2Bu7r4ZuE12qu3vLlYaEoVBh1zTw%3D&reserved=0>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Luxxii · 2022-07-01T09:35:25Z

Thanks for the quick answer! and the clarification! Yes, splitting fasta files are always an option... However, i would look forward to execute a search via a single large fasta file.

If this is not a priority or not planned, then you can close this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java.lang.NegativeArraySizeException #140

Java.lang.NegativeArraySizeException #140

Luxxii commented Jul 1, 2022 •

edited

Loading

FarmGeek4Life commented Jul 1, 2022 via email

Luxxii commented Jul 1, 2022

Java.lang.NegativeArraySizeException #140

Java.lang.NegativeArraySizeException #140

Comments

Luxxii commented Jul 1, 2022 • edited Loading

FarmGeek4Life commented Jul 1, 2022 via email

Luxxii commented Jul 1, 2022

Luxxii commented Jul 1, 2022 •

edited

Loading