[Bioc-devel] show method for CompressedVRangesList-class

Robert Castelo robert.castelo at upf.edu
Thu Feb 26 16:56:59 CET 2015


great, thanks!!

robert.

On 02/25/2015 10:06 PM, Michael Lawrence wrote:
> I checked in a fix for the splitting to CompressedVRangesList. The
> slowness of creating a SimpleVRangesList is due to the cost of
> extracting a VRanges for each sample. Depending your exact use case, it
> might be better to pay that cost up-front, instead of deferring it to
> when the user wants to extract an element, which happens with the
> compressed list. As long as the number of samples is small, the memory
> overhead should be minimal.
>
> Michael
>
> On Wed, Feb 25, 2015 at 9:59 AM, Michael Lawrence <michafla at gene.com
> <mailto:michafla at gene.com>> wrote:
>
>     Yea, I know, just need to get around to that one. Technically, it
>     works, but it's obviously not ideal.
>
>     On Wed, Feb 25, 2015 at 8:52 AM, Gabe Becker <becker.gabe at gene.com
>     <mailto:becker.gabe at gene.com>> wrote:
>
>         Why does splitting a VRanges give a GRangesList with VRanges
>         objects as elements? Seems like it should return a VRangesList.
>
>              > spl = split(vr, sampleNames(vr))
>              > class(spl)
>             [1] "GRangesList"
>             attr(,"package")
>             [1] "GenomicRanges"
>              > class(spl[[1]])
>             [1] "VRanges"
>             attr(,"package")
>             [1] "VariantAnnotation"
>
>
>         ~G
>
>         On Wed, Feb 25, 2015 at 8:39 AM, Michael Lawrence
>         <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
>         wrote:
>
>             Construction will take longer; the savings are in the
>             accessing of the
>             elements. But this seems like too much longer, so I will
>             look into it.
>
>             On Wed, Feb 25, 2015 at 8:12 AM, Robert Castelo
>             <robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>>
>             wrote:
>
>              > my current reason to prefer a CompressedVRangesList
>             object over a
>              > SimpleVRangesList object is that i find one order of
>             magnitude difference
>              > in creation time in each of these classes of objects:
>              >
>              > library(VariantAnnotation)
>              >
>              > fl <- system.file("extdata", "CEUtrio.vcf.bgz",
>              >                   package="VariantFiltering")
>              >
>              > vcf <- readVcf(fl, genome="hg19")
>              > vr <- as(vcf, "VRanges")
>              > length(vr)
>              > [1] 15000
>              >
>              > ## create a VRangesList object
>              > system.time(vrl <- do.call("VRangesList", split(vr,
>             sampleNames(vr))))
>              >    user  system elapsed
>              >   0.247   0.004   0.252
>              >
>              > ## create a CompressedVRangesList object
>              > system.time(cvrl <- new("CompressedVRangesList", split(vr,
>              > sampleNames(vr))))
>              >    user  system elapsed
>              >   0.019   0.000   0.019
>              >
>              > 0.252/0.019
>              > [1] 13.26316
>              >
>              > with a larger vcf differences increase:
>              >
>              > [... load vcf, coerce to VRanges ...]
>              > length(vr)
>              > [1] 25916
>              >
>              > system.time(vrl <- do.call("VRangesList", split(vr,
>             sampleNames(vr))))
>              >    user  system elapsed
>              >   2.672   0.000   2.676
>              >
>              > system.time(cvrl <- new("CompressedVRangesList", split(vr,
>              > sampleNames(vr))))
>              >    user  system elapsed
>              >   0.014   0.000   0.014
>              >
>              > 2.676 / 0.014
>              > [1] 191.1429
>              >
>              >
>              > so maybe i'm using the wrong way to construct a
>             VRangesList object, but
>              > according to our last conversation about this, there was
>             no obvious default
>              > fast way to do it, starting from a VRanges object:
>              >
>              >
>             https://stat.ethz.ch/pipermail/bioc-devel/2015-January/006905.html
>              >
>              > it would be great if there's a fast way to do this kind
>             of construction.
>              >
>              > thanks,
>              >
>              > robert.
>              >
>              > On 02/25/2015 04:42 PM, Michael Lawrence wrote:
>              >
>              >> If you're storing data on a relatively small number of
>             individuals (say,
>              >> hundreds), you should use SimpleVRangesList, not
>             CompressedVRangesList.
>              >>
>              >> On Wed, Feb 25, 2015 at 7:10 AM, Robert Castelo
>             <robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>
>              >> <mailto:robert.castelo at upf.edu
>             <mailto:robert.castelo at upf.edu>>> wrote:
>              >>
>              >>     i see you point, the logic i was thinking about is
>             to use a list of
>              >>     VRanges objects to hold separately the variants of
>             multiple
>              >>     individuals, with one VRanges object per individual.
>              >>
>              >>     if i type the name of such a list object on the R
>             shell, having the
>              >>     GRangesList show method, i feel i do not see much
>             information
>              >>     because the screen just scrolls up tens or hundreds
>             of lines
>              >>     specifiying variants per individual. however, the
>             concise appearance
>              >>     of something like a VRangesList:
>              >>
>              >> > vrl
>              >>     VRangesList of length 10
>              >>     names(32): S1 S2 S3 S4 ... S7 S8 S9 S10
>              >>
>              >>     at least suggests the user that the object holding
>             the variants has
>              >>     information for 10 samples and belongs to the class
>             'VRangesList'.
>              >>
>              >>     i thought this made general sense but i'm fine if
>             you feel this
>              >>     interpretation does not warrant such a change.
>              >>
>              >>     cheers,
>              >>
>              >>     robert.
>              >>
>              >>     On 02/25/2015 01:25 AM, Michael Lawrence wrote:
>              >>
>              >>         Why not have the SimpleVRangesList be shown like
>              >>         CompressedVRangesList,
>              >>         for consistency with GRangesList? In other
>             words, the opposite
>              >>         of what
>              >>         you propose. A strong argument could also be
>             made that a
>              >>         SimpleGenomicRangesList should be shown like a
>             GRangesList.
>              >>         Unless there
>              >>         is some aversion to the more verbose output....
>              >>
>              >>         On Tue, Feb 24, 2015 at 2:36 PM, Robert Castelo
>              >> <robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>
>             <mailto:robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>>
>              >> <mailto:robert.castelo at upf.edu
>             <mailto:robert.castelo at upf.edu>
>              >>
>              >> <mailto:robert.castelo at upf.edu
>             <mailto:robert.castelo at upf.edu>>__>> wrote:
>              >>
>              >>              so, yes, but IMO rather than inheriting the
>             show method from
>              >> a
>              >>              GRangesList, i think that the show method for
>              >>         CompressedVRangesList
>              >>              objects should be inherited from a
>             VRangesList object.
>              >>         right now
>              >>              this is the situation:
>              >>
>              >>              library(VariantAnnotation)
>              >>
>              >>              example(VRangesList)
>              >>              vrl
>              >>              VRangesList of length 2
>              >>              names(2): sampleA sampleB
>              >>
>              >>              cvrl <- new("CompressedVRangesList", split(vr,
>              >>         sampleNames(vr)))
>              >>              cvrl
>              >>              CompressedVRangesList object of length 2:
>              >>              $a
>              >>              VRanges object with 1 range and 1 metadata
>             column:
>              >>                     seqnames    ranges strand
>               ref              alt
>              >>              totalDepth       refDepth       altDepth
>              >> <Rle> <IRanges> <Rle> <character> <characterOrRle>
>             <integerOrRle>
>              >> <integerOrRle> <integerOrRle>
>              >>                 [1]     chr1    [1, 5]      +           T
>              >>              C             12              5              7
>              >>                       sampleNames softFilterMatrix |
>             tumorSpecific
>              >> <factorOrRle> <matrix> | <logical>
>              >>                 [1]             a             TRUE |
>                   FALSE
>              >>
>              >>              $b
>              >>              VRanges object with 1 range and 1 metadata
>             column:
>              >>                     seqnames   ranges strand ref alt
>             totalDepth refDepth
>              >>         altDepth
>              >>              sampleNames softFilterMatrix |
>              >>                 [1]     chr2 [10, 20]      +   A   T
>                   17       10
>              >>              6           b            FALSE |
>              >>                     tumorSpecific
>              >>                 [1]          TRUE
>              >>
>              >>              -------
>              >>              seqinfo: 2 sequences from an unspecified
>             genome; no
>              >> seqlengths
>              >>
>              >>              would it be possible to have the
>             VRangesList show method for
>              >>              CompressedVRangesList objects?
>              >>
>              >>              robert.
>              >>
>              >>
>              >>
>              >>              On 2/24/15 7:24 PM, Michael Lawrence wrote:
>              >>
>              >>                  I think you might be missing an import.
>             It should
>              >>             inherit the
>              >>                  method for GRangesList.
>              >>
>              >>                  On Tue, Feb 24, 2015 at 9:53 AM, Robert
>             Castelo
>              >> <robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>
>             <mailto:robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>>
>              >> <mailto:robert.castelo at upf.edu
>             <mailto:robert.castelo at upf.edu>
>              >> <mailto:robert.castelo at upf.edu
>             <mailto:robert.castelo at upf.edu>>__>> wrote:
>              >>
>              >>                      hi,
>              >>
>              >>                      i'm using the CompressedVRangesList
>             class in
>              >>             VariantFiltering
>              >>                      to hold variants and their
>             annotations across
>              >>             multiple samples
>              >>                      and found that there was no show
>             method for this
>              >>             class (unless
>              >>                      i'm missing the right import here)
>             so i made one
>              >> within
>              >>                      VariantFiltering by copying&pasting
>             from other
>              >>             similar classes:
>              >>
>              >>                      setMethod("show",
>              >>             signature(object="__CompressedVRangesList"),
>              >>                                function(object) {
>              >>                                  lo <- length(object)
>              >>
>             cat(classNameForDisplay(__object), " of
>              >>             length ",
>              >>                      lo, "\n",
>              >>                                      sep = "")
>              >>                                  if
>             (!is.null(names(object)))
>              >>                                    cat(BiocGenerics:::__
>              >> labeledLine("names",
>              >>                      names(object)))
>              >>                                })
>              >>
>              >>                      i guess, however, that the right
>             home for this would
>              >> be
>              >>                      VariantAnnotation. let me know if
>             you consider
>              >>             adding it there
>              >>                      (or somewhere else) and i'll remove
>             it from
>              >>             VariantFiltering.
>              >>
>              >>                      thanks,
>              >>
>              >>                      robert.
>              >>
>              >>
>             _________________________________________________
>              >> Bioc-devel at r-project.org
>             <mailto:Bioc-devel at r-project.org>
>             <mailto:Bioc-devel at r-project.org
>             <mailto:Bioc-devel at r-project.org>>
>              >> <mailto:Bioc-devel at r-project.
>             <mailto:Bioc-devel at r-project.>__org
>              >> <mailto:Bioc-devel at r-project.org
>             <mailto:Bioc-devel at r-project.org>>>
>              >>                      mailing list
>              >> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>              >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>              >>
>              >>
>              >>
>              >>
>              >>
>              >>     --
>              >>     Robert Castelo, PhD
>              >>     Associate Professor
>              >>     Dept. of Experimental and Health Sciences
>              >>     Universitat Pompeu Fabra (UPF)
>              >>     Barcelona Biomedical Research Park (PRBB)
>              >>     Dr Aiguader 88
>              >>     E-08003 Barcelona, Spain
>              >>     telf: +34.933.160.514 <tel:%2B34.933.160.514>
>             <tel:%2B34.933.160.514>
>              >>     fax: +34.933.160.550 <tel:%2B34.933.160.550>
>             <tel:%2B34.933.160.550>
>              >>
>              >>
>              >>
>              > --
>              > Robert Castelo, PhD
>              > Associate Professor
>              > Dept. of Experimental and Health Sciences
>              > Universitat Pompeu Fabra (UPF)
>              > Barcelona Biomedical Research Park (PRBB)
>              > Dr Aiguader 88
>              > E-08003 Barcelona, Spain
>              > telf: +34.933.160.514 <tel:%2B34.933.160.514>
>              > fax: +34.933.160.550 <tel:%2B34.933.160.550>
>              >
>
>                      [[alternative HTML version deleted]]
>
>             _______________________________________________
>             Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>             mailing list
>             https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
>         --
>         Gabriel Becker, Ph.D
>         Computational Biologist
>         Genentech Research
>
>
>

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550



More information about the Bioc-devel mailing list