[Bioc-devel] VRanges with multiple samples

Robert Castelo robert.castelo at upf.edu
Thu Jan 29 18:36:16 CET 2015


hi Michael, thanks for sharing your opinion, comments below,

On 01/28/2015 06:22 PM, Michael Lawrence wrote:
[...]

> Is your concern here scalability, ease of use, or what? If scalability,
> we should probably start thinking about a more efficient representation
> for repeated vectors, kind of like Rle, except for rep(,each=FALSE). It
> would just %% the index. I think this would be generally useful and so
> may be of more value than a more complex VRanges. After all, it is the
> (totally justifiable) complexity of VCF that motivated VRanges in the
> first place.

i'm concerned about the scalability with multisample VCFs when adding 
annotations. What you propose about using Rle-like vectors to store 
identical values from different samples together sounds good to me and 
I'm also in favor of keeping data structures as simple as possible. 
Maybe for the time being I'll try to use 'VRanges' just as they are now 
and I'll try to explore how bad it gets when scaling in samples and 
annotations to justify doing something about it along the lines you suggest.

[...]

> I am not sure if coercion via as() would make sense here, since there is
> no obvious reason why the split would be by sample. Why not just use
> split(vr, sampleNames(vr))? That should work already.

i see your point in that the splitting a VRanges could be motivated by 
something else than sample and as you suggest 'split()' does the work 
very fast. actually invoking to the VRangesList constructor i get what i 
was looking for:

do.call("VRangesList", split(vr, sampleNames(vr)))
VRangesList of length 3
names(3): sample1 sample2 sample3


although i realize now that the rle-like strategy you propose then would 
not be usable when splitting by sample.

cheers,

robert.



More information about the Bioc-devel mailing list