-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy the tag type text from SAM spec to SAMtags (PR #804) #804
base: master
Are you sure you want to change the base?
Conversation
SAM section 1.5 clearly defines the standard SAM tag types along with the expanded codes used in the B byte-array type. This text has been copied into the SAMtags document to remove a rather woolly definition there. The text describing lower-case tags has been removed, because this was already discussed in further detail at the end of the SAMtags document (and is itself somewhat woolly when it comes to half-upper half-lower combinations due to the addition of draft tags). Fixes samtools#798
Why not link to § 1.5 "The alignment section: optional fields", rather than duplicating the serialization format here? The introduction notes that this is a companion document, which associates the two together anyway. Alternatively, you could replace the type codes with generalized types, e.g., |
I agree with @zaeleus that it'd be better to have a less woolly1 summary of the types that are actually used in the table and a pointer to SAMv1.pdf “for the complete details” rather than duplicate the complete details here. I don't think we should use informal types such as Footnotes
|
If this is woolly then the SAM specification is woolly too as this is literally just replacing the poor text originally here with the SAM specification text. Or maybe I just misunderstand and you're saying this is better than the old text? I simply tightened up what was there before as it had very vague terms ("character" could permit UTF-8, and "hexadecimal array" could imply comma-separated lists of hex values). We do need to be careful about sizing though. SAM is not specifically tied to 32-bit (signed or otherwise), and we've exploited this in the past when we've gone from 32-bit coordinates to 64-bit coordinates. The BAM serialisation has issues, but the SAM encoding does not and we shouldn't needlessly paint ourselves into corners by over specifying as it restricts migration to better encodings. (Sadly CRAM made those same mistakes though and I regret not fixing them for CRAM 3.0) Arguably however, and maybe contentiously, if we wish to only have one canonical source for the meaning of the type fields then it perhaps should be SAMtags. The reason it was split in the first place was that the tag encoding is shared between SAM, BAM and CRAM (and now other things like extended FASTQs and GFA). Hence keeping it in its own document. |
SAM positions are tied to 32-bit signed integers:
SAM data field values, however, are not:
|
SAM section 1.5 clearly defines the standard SAM tag types along with the expanded codes used in the B byte-array type. This text has been copied into the SAMtags document to remove a rather woolly definition there. The text describing lower-case tags has been removed, because this was already discussed in further detail at the end of the SAMtags document (and is itself somewhat woolly when it comes to half-upper half-lower combinations due to the addition of draft tags).
Fixes #798