-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MM tag preferred format for TAPS data #785
Comments
Apologies for the delay. Initially because I simply didn't know, and had never heard of TAPS, but then because I sadly got distracted and forgot to go back to this. I'd think N+m is probably the way to go. The specification states: "Note ‘N’ may be used to match any base rather than specifically an ‘N’ call by the sequencing instrument. This may be used in situations where the base modification is not a derivation of a standard base type". (I wouldn't trust that T+m will work as it's could cause problems with validators. I haven't tried it, but I'd be surprised if it does work given the way the base counting works and there may well be tools that explicitly check compatibility of original and modified base type.) |
The text already states that an unmodified base of N means we count any base type, but base N code N in the table is a little misleading as to the intention. It was intended to mean any unspecified modification, in the same way C+C is any unspecified C mod, but in this case it's against all bases rather than a specific base type. However that doesn't solve the issue of whether we can record specific mods against any "fundamental" source base. Clarified this by adding an extra line to the table and some text. (However note this doesn't necessarily imply downstream processing tools will not do any compatibility assessment and reject N+m when the SEQ base is a T.) Fixes samtools#785
The text already states that an unmodified base of N means we count any base type, but base N code N in the table is a little misleading as to the intention. It was intended to mean any unspecified modification, in the same way C+C is any unspecified C mod, but in this case it's against all bases rather than a specific base type. However that doesn't solve the issue of whether we can record specific mods against any "fundamental" source base. Clarified this by adding an extra line to the table and some text. (However note this doesn't necessarily imply downstream processing tools will not do any compatibility assessment and reject N+m when the SEQ base is a T.) Fixes samtools#785
Hello,
TAPS is a methyl-seq method where mC are converted into T.
In SAM spec, most of the mC examples highlight “C+m”, however C+m does not work off the shelf with TAPS data since the mC are represented in the SEQ as Ts and not Cs.
Alternatives to represent the mC mod could include T+m or N+m however these are not officially listed in the table of combinations. Is there any recommendation on which format to settle on?
Thanks,
James
The text was updated successfully, but these errors were encountered: