[Bioc-devel] VRanges with multiple samples

Michael Lawrence lawrence.michael at gene.com
Wed Jan 28 18:22:15 CET 2015


On Wed, Jan 28, 2015 at 8:47 AM, Robert Castelo <robert.castelo at upf.edu>
wrote:

> hi,
>
> currently, the VariantFiltering package works with GRanges objects
> obtained from locateVariants() and predictCoding() to hold annotated
> variants and add further annotations. However, I'd like to use 'VRanges'
> objects which are, as far as i understand them, developed for exactly the
> purpose of storing and manipulating variants and their annotations.
>
> from the available documentation, it seems to me that the route for this
> should coercing the 'VCF' object obtained with 'readVcf()' to 'VRanges' via
> as(vcf, "VRanges"). When the input 'VCF' object has more than one sample,
> this results in to a 'VRanges' object with the variants replicated per
> different sample and a sample indicator column.
>
> i was thinking that as one add annotations to variants the redundancy of
> the information stored in a "multi-sample VRanges" which greatly increase,
> so I was thinking to work having a minimal 'multi-sample VRanges' with the
> sample-specific information and store the annotations in a separate
> DataFrame object with some index column that would link to the
> corresponding 'VRanges' "row".
>

Is your concern here scalability, ease of use, or what? If scalability, we
should probably start thinking about a more efficient representation for
repeated vectors, kind of like Rle, except for rep(,each=FALSE). It would
just %% the index. I think this would be generally useful and so may be of
more value than a more complex VRanges. After all, it is the (totally
justifiable) complexity of VCF that motivated VRanges in the first place.


> i'd like to as you if you have thoughts, suggestions or comments about
> this redundancy issue and this approach i'm thinking about.
>
>
> btw, in the presence of multiple samples, i would find more natural to
> coerce a 'VCF' object into a VRangesList object, instead of a VRanges with
> a sample indicator column.
>
> there is in fact the 'stackSamples()' method to compress a 'VRangesList'
> into a 'VRanges' with a sample indicator column, however there is no
> coercion method:
>
> as(vcf, "VRangesList")
> Error in as(vcf, "VRangesList") :
>   no method or default for coercing “CollapsedVCF” to “VRangesList”
>
> i guess i can write some one-liner get a 'VRangesList' from a multi-sample
> VCF with one 'VRanges' element per sample, but i wonder whether it would
> not make sense to have this as an 'as(vcf, "VRangesList")' method.
>
>
I am not sure if coercion via as() would make sense here, since there is no
obvious reason why the split would be by sample. Why not just use split(vr,
sampleNames(vr))? That should work already.


>
> thanks!!!!
> robert.
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list