noodles 0.68.0 #250
Replies: 2 comments 1 reply
-
What's the design rational behind this? Telomeric breakends are now at {None, Some(chrSize+1)} instead of {0, chrSize+1}. Data type unification and safety of not allowing 0 (thus preventing off-by-one errors from assuming 0-based coordinates) is more important than symmetry for a VCF feature that has essential no real-world uptake? |
Beta Was this translation helpful? Give feedback.
-
Not misinterpreting at all. The design is good, just wanted to make sure
you were aware of the trade-offs. If I get the time, I might see if I can
design a higher level sv api that abstracts some of the vcf notational mess.
…On Fri, 5 Apr 2024, 12:15 Michael Macias, ***@***.***> wrote:
Yes, POS is an overloaded field with three states: a 1-based position on
a reference sequence, a virtual telomeric start (0), or a virtual telomeric
end (len(CHROM) + 1). The variant record position field now favors the
1-based position.
We do, however, still want to support the other two states, as noodles is
a specification-based implementation. A user can either pattern match (
Some(n), None, and Some(n) if n == contig.len()) or expand the range,
which is very similar to the previous position (vcf::record::Position)
representation, e.g.,
let pos = record.variant_start().map(usize::from).unwrap_or(0);
Option<Position> provides better data type unification with intervals (
core::region::Interval) and maintains 1-based safety better than the
previous wrapper. Am I misinterpreting your comment?
—
Reply to this email directly, view it on GitHub
<#250 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABOBYOBKD26BXJPVD3X642LY3X3J7AVCNFSM6AAAAABFXYKKXSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TAMJVHEYTQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
noodles 0.68.0 reworks the variant (VCF and BCF) crates by defining a common interface over variant record types. It allows for greater flexibility and better overall performance, particularly in read-only contexts. This is very similar to the work done in noodles 0.61.0 with the alignment formats.
Notable changes:
Readers and writers are moved under an
io
module. For most usages, addio
to qualified paths, e.g.,vcf::Reader
becomesvcf::io::Reader
.vcf::Record
is renamed tovcf::variant::RecordBuf
. This explicitly differentiates it as a mutable record buffer rather than an immutable record type. Most field buffers are now simple wrappers around their underlying data types.vcf::variant::Record
defines shared behavior of a variant record. When supporting multiple variant formats, accept implementations of this trait rather than concrete record types.Lazy records are now immutable format records. Replace usages of, e.g.,
vcf::lazy::Record
withvcf::Record
. Format records implementvcf::variant::Record
and can now be written.The record chromosome name (
CHROM
) and position (POS
) fields are now known as reference sequence name and variant start. The variant start type changes fromusize
toOption<Position>
. Instead of 0,None
represents the start of a telomeric breakend.Record genotype information (
FORMAT...
) is renamed to samples. It is now represented more like a data frame, where rows are samples and columns are series.Genotype
is a new samples series value type introduced to wrap genotype (GT
) values. While this is not a type in aFORMAT
header record type definition, genotype values are specialized in both VCF and BCF.Please submit questions and feedback here and open new issues if any arise.
Beta Was this translation helpful? Give feedback.
All reactions