[Bioc-devel] show method for CompressedVRangesList-class

Michael Lawrence lawrence.michael at gene.com
Wed Feb 25 22:06:34 CET 2015


I checked in a fix for the splitting to CompressedVRangesList. The slowness
of creating a SimpleVRangesList is due to the cost of extracting a VRanges
for each sample. Depending your exact use case, it might be better to pay
that cost up-front, instead of deferring it to when the user wants to
extract an element, which happens with the compressed list. As long as the
number of samples is small, the memory overhead should be minimal.

Michael

On Wed, Feb 25, 2015 at 9:59 AM, Michael Lawrence <michafla at gene.com> wrote:

> Yea, I know, just need to get around to that one. Technically, it works,
> but it's obviously not ideal.
>
> On Wed, Feb 25, 2015 at 8:52 AM, Gabe Becker <becker.gabe at gene.com> wrote:
>
>> Why does splitting a VRanges give a GRangesList with VRanges objects as
>> elements? Seems like it should return a VRangesList.
>>
>> > spl = split(vr, sampleNames(vr))
>> > class(spl)
>> [1] "GRangesList"
>> attr(,"package")
>> [1] "GenomicRanges"
>> > class(spl[[1]])
>> [1] "VRanges"
>> attr(,"package")
>> [1] "VariantAnnotation"
>>
>>
>> ~G
>>
>> On Wed, Feb 25, 2015 at 8:39 AM, Michael Lawrence <
>> lawrence.michael at gene.com> wrote:
>>
>>> Construction will take longer; the savings are in the accessing of the
>>> elements. But this seems like too much longer, so I will look into it.
>>>
>>> On Wed, Feb 25, 2015 at 8:12 AM, Robert Castelo <robert.castelo at upf.edu>
>>> wrote:
>>>
>>> > my current reason to prefer a CompressedVRangesList object over a
>>> > SimpleVRangesList object is that i find one order of magnitude
>>> difference
>>> > in creation time in each of these classes of objects:
>>> >
>>> > library(VariantAnnotation)
>>> >
>>> > fl <- system.file("extdata", "CEUtrio.vcf.bgz",
>>> >                   package="VariantFiltering")
>>> >
>>> > vcf <- readVcf(fl, genome="hg19")
>>> > vr <- as(vcf, "VRanges")
>>> > length(vr)
>>> > [1] 15000
>>> >
>>> > ## create a VRangesList object
>>> > system.time(vrl <- do.call("VRangesList", split(vr, sampleNames(vr))))
>>> >    user  system elapsed
>>> >   0.247   0.004   0.252
>>> >
>>> > ## create a CompressedVRangesList object
>>> > system.time(cvrl <- new("CompressedVRangesList", split(vr,
>>> > sampleNames(vr))))
>>> >    user  system elapsed
>>> >   0.019   0.000   0.019
>>> >
>>> > 0.252/0.019
>>> > [1] 13.26316
>>> >
>>> > with a larger vcf differences increase:
>>> >
>>> > [... load vcf, coerce to VRanges ...]
>>> > length(vr)
>>> > [1] 25916
>>> >
>>> > system.time(vrl <- do.call("VRangesList", split(vr, sampleNames(vr))))
>>> >    user  system elapsed
>>> >   2.672   0.000   2.676
>>> >
>>> > system.time(cvrl <- new("CompressedVRangesList", split(vr,
>>> > sampleNames(vr))))
>>> >    user  system elapsed
>>> >   0.014   0.000   0.014
>>> >
>>> > 2.676 / 0.014
>>> > [1] 191.1429
>>> >
>>> >
>>> > so maybe i'm using the wrong way to construct a VRangesList object, but
>>> > according to our last conversation about this, there was no obvious
>>> default
>>> > fast way to do it, starting from a VRanges object:
>>> >
>>> > https://stat.ethz.ch/pipermail/bioc-devel/2015-January/006905.html
>>> >
>>> > it would be great if there's a fast way to do this kind of
>>> construction.
>>> >
>>> > thanks,
>>> >
>>> > robert.
>>> >
>>> > On 02/25/2015 04:42 PM, Michael Lawrence wrote:
>>> >
>>> >> If you're storing data on a relatively small number of individuals
>>> (say,
>>> >> hundreds), you should use SimpleVRangesList, not
>>> CompressedVRangesList.
>>> >>
>>> >> On Wed, Feb 25, 2015 at 7:10 AM, Robert Castelo <
>>> robert.castelo at upf.edu
>>> >> <mailto:robert.castelo at upf.edu>> wrote:
>>> >>
>>> >>     i see you point, the logic i was thinking about is to use a list
>>> of
>>> >>     VRanges objects to hold separately the variants of multiple
>>> >>     individuals, with one VRanges object per individual.
>>> >>
>>> >>     if i type the name of such a list object on the R shell, having
>>> the
>>> >>     GRangesList show method, i feel i do not see much information
>>> >>     because the screen just scrolls up tens or hundreds of lines
>>> >>     specifiying variants per individual. however, the concise
>>> appearance
>>> >>     of something like a VRangesList:
>>> >>
>>> >>      > vrl
>>> >>     VRangesList of length 10
>>> >>     names(32): S1 S2 S3 S4 ... S7 S8 S9 S10
>>> >>
>>> >>     at least suggests the user that the object holding the variants
>>> has
>>> >>     information for 10 samples and belongs to the class 'VRangesList'.
>>> >>
>>> >>     i thought this made general sense but i'm fine if you feel this
>>> >>     interpretation does not warrant such a change.
>>> >>
>>> >>     cheers,
>>> >>
>>> >>     robert.
>>> >>
>>> >>     On 02/25/2015 01:25 AM, Michael Lawrence wrote:
>>> >>
>>> >>         Why not have the SimpleVRangesList be shown like
>>> >>         CompressedVRangesList,
>>> >>         for consistency with GRangesList? In other words, the opposite
>>> >>         of what
>>> >>         you propose. A strong argument could also be made that a
>>> >>         SimpleGenomicRangesList should be shown like a GRangesList.
>>> >>         Unless there
>>> >>         is some aversion to the more verbose output....
>>> >>
>>> >>         On Tue, Feb 24, 2015 at 2:36 PM, Robert Castelo
>>> >>         <robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>
>>> >>         <mailto:robert.castelo at upf.edu
>>> >>
>>> >>         <mailto:robert.castelo at upf.edu>__>> wrote:
>>> >>
>>> >>              so, yes, but IMO rather than inheriting the show method
>>> from
>>> >> a
>>> >>              GRangesList, i think that the show method for
>>> >>         CompressedVRangesList
>>> >>              objects should be inherited from a VRangesList object.
>>> >>         right now
>>> >>              this is the situation:
>>> >>
>>> >>              library(VariantAnnotation)
>>> >>
>>> >>              example(VRangesList)
>>> >>              vrl
>>> >>              VRangesList of length 2
>>> >>              names(2): sampleA sampleB
>>> >>
>>> >>              cvrl <- new("CompressedVRangesList", split(vr,
>>> >>         sampleNames(vr)))
>>> >>              cvrl
>>> >>              CompressedVRangesList object of length 2:
>>> >>              $a
>>> >>              VRanges object with 1 range and 1 metadata column:
>>> >>                     seqnames    ranges strand         ref
>>>   alt
>>> >>              totalDepth       refDepth       altDepth
>>> >>         <Rle> <IRanges> <Rle> <character> <characterOrRle>
>>> <integerOrRle>
>>> >>         <integerOrRle> <integerOrRle>
>>> >>                 [1]     chr1    [1, 5]      +           T
>>> >>              C             12              5              7
>>> >>                       sampleNames softFilterMatrix | tumorSpecific
>>> >>         <factorOrRle> <matrix> | <logical>
>>> >>                 [1]             a             TRUE |         FALSE
>>> >>
>>> >>              $b
>>> >>              VRanges object with 1 range and 1 metadata column:
>>> >>                     seqnames   ranges strand ref alt totalDepth
>>> refDepth
>>> >>         altDepth
>>> >>              sampleNames softFilterMatrix |
>>> >>                 [1]     chr2 [10, 20]      +   A   T         17
>>>  10
>>> >>              6           b            FALSE |
>>> >>                     tumorSpecific
>>> >>                 [1]          TRUE
>>> >>
>>> >>              -------
>>> >>              seqinfo: 2 sequences from an unspecified genome; no
>>> >> seqlengths
>>> >>
>>> >>              would it be possible to have the VRangesList show method
>>> for
>>> >>              CompressedVRangesList objects?
>>> >>
>>> >>              robert.
>>> >>
>>> >>
>>> >>
>>> >>              On 2/24/15 7:24 PM, Michael Lawrence wrote:
>>> >>
>>> >>                  I think you might be missing an import. It should
>>> >>             inherit the
>>> >>                  method for GRangesList.
>>> >>
>>> >>                  On Tue, Feb 24, 2015 at 9:53 AM, Robert Castelo
>>> >>             <robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>
>>> >>             <mailto:robert.castelo at upf.edu
>>> >>             <mailto:robert.castelo at upf.edu>__>> wrote:
>>> >>
>>> >>                      hi,
>>> >>
>>> >>                      i'm using the CompressedVRangesList class in
>>> >>             VariantFiltering
>>> >>                      to hold variants and their annotations across
>>> >>             multiple samples
>>> >>                      and found that there was no show method for this
>>> >>             class (unless
>>> >>                      i'm missing the right import here) so i made one
>>> >> within
>>> >>                      VariantFiltering by copying&pasting from other
>>> >>             similar classes:
>>> >>
>>> >>                      setMethod("show",
>>> >>             signature(object="__CompressedVRangesList"),
>>> >>                                function(object) {
>>> >>                                  lo <- length(object)
>>> >>                                  cat(classNameForDisplay(__object), "
>>> of
>>> >>             length ",
>>> >>                      lo, "\n",
>>> >>                                      sep = "")
>>> >>                                  if (!is.null(names(object)))
>>> >>                                    cat(BiocGenerics:::__
>>> >> labeledLine("names",
>>> >>                      names(object)))
>>> >>                                })
>>> >>
>>> >>                      i guess, however, that the right home for this
>>> would
>>> >> be
>>> >>                      VariantAnnotation. let me know if you consider
>>> >>             adding it there
>>> >>                      (or somewhere else) and i'll remove it from
>>> >>             VariantFiltering.
>>> >>
>>> >>                      thanks,
>>> >>
>>> >>                      robert.
>>> >>
>>> >>                      _________________________________________________
>>> >>             Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org
>>> >
>>> >>             <mailto:Bioc-devel at r-project.__org
>>> >>             <mailto:Bioc-devel at r-project.org>>
>>> >>                      mailing list
>>> >>             https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>>> >>             <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>     --
>>> >>     Robert Castelo, PhD
>>> >>     Associate Professor
>>> >>     Dept. of Experimental and Health Sciences
>>> >>     Universitat Pompeu Fabra (UPF)
>>> >>     Barcelona Biomedical Research Park (PRBB)
>>> >>     Dr Aiguader 88
>>> >>     E-08003 Barcelona, Spain
>>> >>     telf: +34.933.160.514 <tel:%2B34.933.160.514>
>>> >>     fax: +34.933.160.550 <tel:%2B34.933.160.550>
>>> >>
>>> >>
>>> >>
>>> > --
>>> > Robert Castelo, PhD
>>> > Associate Professor
>>> > Dept. of Experimental and Health Sciences
>>> > Universitat Pompeu Fabra (UPF)
>>> > Barcelona Biomedical Research Park (PRBB)
>>> > Dr Aiguader 88
>>> > E-08003 Barcelona, Spain
>>> > telf: +34.933.160.514
>>> > fax: +34.933.160.550
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>>
>> --
>> Gabriel Becker, Ph.D
>> Computational Biologist
>> Genentech Research
>>
>
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list