[Bioc-devel] VariantAnnotation: Same locus, multiple samples
Valerie Obenchain
vobencha at fredhutch.org
Tue Dec 9 03:16:19 CET 2014
(Resending - the last message didn't post to the list.)
I was thinking the absence of a header in VRanges would make collapsing
difficult and with your comments it's clear this isn't a good idea.
I like the description you gave of the differences in class content and
geometry and have added them to the VRanges man page.
Valerie
On 12/08/14 13:25, Michael Lawrence wrote:
> I don't see how this can be fixed. The two data structures are
> semantically incompatible; they encode different types of information,
> so information is lost in both directions. Even if we collapsed the
> alts, there is no way (as far as I know) to say that data for one
> individual + alt combination is absent. We could put NA (".") for every
> value concerning that alt, but it seems too big of an assumption to say
> that all(is.na <http://is.na>())) implies omission of the VRanges
> element. In other words, VCF is rectangular and VRanges is ragged, and
> there is no established way to encode the raggedness in the VCF.
>
>
>
> On Mon, Dec 8, 2014 at 11:27 AM, Valerie Obenchain
> <vobencha at fredhutch.org <mailto:vobencha at fredhutch.org>> wrote:
>
> This could be fixed in the VRanges -> VCF coercion or in VCF -> VRanges.
>
> Currently VRanges -> VCF creates a VCF with >1 row per position (ie,
> does not collapse ALT values). I'm not sure this is technically
> valid as per the specs, however, it may have been by design to meet
> another need. If we are ok with >1 row per position the change can
> be made in VCF -> VRanges.
>
> Opinions?
>
> Valerie
>
>
>
> On 12/05/2014 01:18 AM, Julian Gehring wrote:
>
> Hi,
>
> Assume that we have two variants from two samples at the same locus,
> stored in a 'VRanges' or 'VCF' object:
>
> library(VariantAnnotation)
>
> vr = VRanges("1", IRanges(c(10, 10), width = 1),
> ref = c("C", "C"), alt = c("A", "G"),
> sampleNames = c("S1", "S2"))
> vcf = as(vr, "VCF")
>
> If we convert the VCF to a VRanges, we now get each variant in each
> patient:
>
> vr2 = as(vcf, "VRanges")
>
> length(vr) ## 2
> length(vr2) ## 4
>
> It seems that the VCF object does not store the information of the
> 'sampleNames' in the first conversion.
>
> Best wishes
> Julian
>
> _________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
More information about the Bioc-devel
mailing list