Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using trypsin vs no enzyme changes score on same PSM? #148

Open
glormph opened this issue Oct 20, 2023 · 4 comments
Open

Using trypsin vs no enzyme changes score on same PSM? #148

glormph opened this issue Oct 20, 2023 · 4 comments
Labels

Comments

@glormph
Copy link

glormph commented Oct 20, 2023

Hi, I have a peptidomics run with a DB with pre-digested peptides, including those with one missed cleavage. I have recently discovered -ignoreMetCleavage 1 -enzyme 9, which I have started using. After that, a previously included peptide, MKDTDNEEEIR disappeared from my results, and while it still matched, it did so at a lower score (RawScore ~60 vs previously ~160).

I played with the parameters and found that:

  • if -ntt 2 and -e 9 , the NTT in the mzIdentML is reported as 0
  • if the only thing I change is -e 9 -> -e 1, and keeping -ignoreMetCleavage 1 the peptide above has more peaks matched and a higher RawScore.

The diff in the <AnalysisProtocolCollection> between the trysin and no_enzyme is only this:

$ diff tryp.analysis notryp.analysis 
15c15
<       <userParam name="NumTolerableTermini" value="2"/>
---
>       <userParam name="NumTolerableTermini" value="0"/>
42c42
<       <Enzyme semiSpecific="false" missedCleavages="-1" id="Tryp">
---
>       <Enzyme semiSpecific="true" missedCleavages="-1" id="NoCleavage">
44c44
<           <cvParam cvRef="PSI-MS" accession="MS:1001251" name="Trypsin"/>
---
>           <cvParam cvRef="PSI-MS" accession="MS:1001955" name="no cleavage"/>

The most eye catching to me is the difference in RawScore and matched ions. The diff between the SpectrumIdentificationResults for the two scans is:

$ diff tryp.MKDTDNEEEIR notryp.MKDTDNEEEIR 
3,7c3,7
<           <PeptideEvidenceRef peptideEvidence_ref="PepEv_20762373_MKDTDNEEEIR_1"/>
<           <cvParam cvRef="PSI-MS" accession="MS:1002049" name="MS-GF:RawScore" value="161"/>
<           <cvParam cvRef="PSI-MS" accession="MS:1002050" name="MS-GF:DeNovoScore" value="162"/>
<           <cvParam cvRef="PSI-MS" accession="MS:1002052" name="MS-GF:SpecEValue" value="7.9780874E-16"/>
<           <cvParam cvRef="PSI-MS" accession="MS:1002053" name="MS-GF:EValue" value="6.539515E-8"/>
---
>           <PeptideEvidenceRef peptideEvidence_ref="PepEv_13257102_MKDTDNEEEIR_1"/>
>           <cvParam cvRef="PSI-MS" accession="MS:1002049" name="MS-GF:RawScore" value="66"/>
>           <cvParam cvRef="PSI-MS" accession="MS:1002050" name="MS-GF:DeNovoScore" value="124"/>
>           <cvParam cvRef="PSI-MS" accession="MS:1002052" name="MS-GF:SpecEValue" value="7.1621056E-9"/>
>           <cvParam cvRef="PSI-MS" accession="MS:1002053" name="MS-GF:EValue" value="0.49847928"/>
10,22c10,22
<           <userParam name="ExplainedIonCurrentRatio" value="0.28976208"/>
<           <userParam name="NTermIonCurrentRatio" value="0.18838717"/>
<           <userParam name="CTermIonCurrentRatio" value="0.101374924"/>
<           <userParam name="MS2IonCurrent" value="5932224.0"/>
<           <userParam name="NumMatchedMainIons" value="19"/>
<           <userParam name="MeanErrorAll" value="2.4083107"/>
<           <userParam name="StdevErrorAll" value="3.6545222"/>
<           <userParam name="MeanErrorTop7" value="0.85264057"/>
<           <userParam name="StdevErrorTop7" value="0.54933465"/>
<           <userParam name="MeanRelErrorAll" value="-1.0829566"/>
<           <userParam name="StdevRelErrorAll" value="4.240601"/>
<           <userParam name="MeanRelErrorTop7" value="-0.6560206"/>
<           <userParam name="StdevRelErrorTop7" value="0.77356416"/>
---
>           <userParam name="ExplainedIonCurrentRatio" value="0.12744826"/>
>           <userParam name="NTermIonCurrentRatio" value="0.028266326"/>
>           <userParam name="CTermIonCurrentRatio" value="0.09918193"/>
>           <userParam name="MS2IonCurrent" value="6465587.5"/>
>           <userParam name="NumMatchedMainIons" value="12"/>
>           <userParam name="MeanErrorAll" value="1.9757134"/>
>           <userParam name="StdevErrorAll" value="1.3534566"/>
>           <userParam name="MeanErrorTop7" value="2.0432508"/>
>           <userParam name="StdevErrorTop7" value="1.2680426"/>
>           <userParam name="MeanRelErrorAll" value="-0.782619"/>
>           <userParam name="StdevRelErrorAll" value="2.2633593"/>
>           <userParam name="MeanRelErrorTop7" value="-0.86563396"/>
>           <userParam name="StdevRelErrorTop7" value="2.2435427"/>

So I'm wondering if the lower scoring somehow has to do with the termini, or that the enzyme has something to do with this?

This question may be related to #120

@glormph
Copy link
Author

glormph commented Oct 20, 2023

Interestingly although the NTT is 0 in the XML, the stdout SearchParams shows it is 2:

        PrecursorMassTolerance: 10.0 ppm
        IsotopeError: -1,2
        TargetDecoyAnalysis: false
        FragmentationMethod: As written in the spectrum or CID if no info
        Instrument: QExactive (Q-Exactive)
        Enzyme: NoCleavage
        Protocol: TMT
        NumTolerableTermini: 2
        IgnoreMetCleavage: true

@FarmGeek4Life
Copy link
Collaborator

NumTolerableTermini in the mzid results is changed here, forcing '0' if using enzyme settings 'Unspecific Cleavage' or 'No Cleavage': https://github.com/MSGFPlus/msgfplus/blob/master/src/main/java/edu/ucsd/msjava/mzid/AnalysisProtocolCollectionGen.java#L100

'-ntt' not being '0' is technically invalid for enzymes 'No Cleavage and 'Unspecific Cleavage', because for 'No Cleavage' there is no cleavage residue to verify/enforce, and for 'Unspecific Cleavage', there is no specific cleavage residue because all residues are possible cleavage points.

You might want to use '-ntt 0' for 'No Cleavage'; it appears that if a non-zero value is supplied for '-ntt' you still get some enzyme-search-specific behavior, and if you are using a predigested peptide DB, you probably don't want that. This also might be a change that we enforce in the code.

We did have to fix behavior for 'No Cleavage' in 2018 to have it not be treated the same as 'Unspecific Cleavage'; there is the possibility that other changes are also needed to have the correct behavior everywhere, but as mentioned before, MS-GF+ is in maintenance mode; we don't have the time or funding to put significant effort into improvements, but we will accept reasonable pull requests.

@glormph
Copy link
Author

glormph commented Oct 20, 2023

Yes, it makes sense forcing ntt to 0 for 'no cleavage', I'll try to use -ntt 0 and see if that makes a difference for the scoring.

I have understood that it is in maintenance mode, bad luck for me, but sounds reasonable!

@glormph
Copy link
Author

glormph commented Oct 22, 2023

So setting ntt to 0, and using no cleavage, the search becomes very slow (24 hours where a ntt 2 search takes maybe 3h30), and peptides matched seem to be a result of unspecific cleavage. It also did not improve scoring for the above mentioned PSM.

I tried to go through the code a bit to understand why, but my understanding is not great here. Maybe the amount of peptides to consider becomes to big in this line? https://github.com/MSGFPlus/msgfplus/blob/master/src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java#L280

I am a bit in over my head here, and I haven't yet solved my actual question of why a tryptic peptide gets a lower score when searching with no cleavage. Also it is the weekend :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants