[Bioc-devel] VariantAnnotation: Same locus, multiple samples

Valerie Obenchain vobencha at fredhutch.org
Tue Dec 9 03:16:19 CET 2014


(Resending - the last message didn't post to the list.)

I was thinking the absence of a header in VRanges would make collapsing 
difficult and with your comments it's clear this isn't a good idea.

I like the description you gave of the differences in class content and 
geometry and have added them to the VRanges man page.

Valerie


On 12/08/14 13:25, Michael Lawrence wrote:
> I don't see how this can be fixed. The two data structures are
> semantically incompatible; they encode different types of information,
> so information is lost in both directions. Even if we collapsed the
> alts, there is no way (as far as I know) to say that data for one
> individual + alt combination is absent. We could put NA (".") for every
> value concerning that alt, but it seems too big of an assumption to say
> that all(is.na <http://is.na>())) implies omission of the VRanges
> element. In other words, VCF is rectangular and VRanges is ragged, and
> there is no established way to encode the raggedness in the VCF.
>
>
>
> On Mon, Dec 8, 2014 at 11:27 AM, Valerie Obenchain
> <vobencha at fredhutch.org <mailto:vobencha at fredhutch.org>> wrote:
>
>     This could be fixed in the VRanges -> VCF coercion or in VCF -> VRanges.
>
>     Currently VRanges -> VCF creates a VCF with >1 row per position (ie,
>     does not collapse ALT values). I'm not sure this is technically
>     valid as per the specs, however, it may have been by design to meet
>     another need. If we are ok with >1 row per position the change can
>     be made in VCF -> VRanges.
>
>     Opinions?
>
>     Valerie
>
>
>
>     On 12/05/2014 01:18 AM, Julian Gehring wrote:
>
>         Hi,
>
>         Assume that we have two variants from two samples at the same locus,
>         stored in a 'VRanges' or 'VCF' object:
>
>             library(VariantAnnotation)
>
>             vr = VRanges("1", IRanges(c(10, 10), width = 1),
>               ref = c("C", "C"), alt = c("A", "G"),
>               sampleNames = c("S1", "S2"))
>             vcf = as(vr, "VCF")
>
>         If we convert the VCF to a VRanges, we now get each variant in each
>         patient:
>
>             vr2 = as(vcf, "VRanges")
>
>             length(vr) ## 2
>             length(vr2) ## 4
>
>         It seems that the VCF object does not store the information of the
>         'sampleNames' in the first conversion.
>
>         Best wishes
>         Julian
>
>         _________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         mailing list
>         https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>



More information about the Bioc-devel mailing list