[Bioc-devel] show method for CompressedVRangesList-class
Robert Castelo
robert.castelo at upf.edu
Thu Feb 26 16:56:59 CET 2015
great, thanks!!
robert.
On 02/25/2015 10:06 PM, Michael Lawrence wrote:
> I checked in a fix for the splitting to CompressedVRangesList. The
> slowness of creating a SimpleVRangesList is due to the cost of
> extracting a VRanges for each sample. Depending your exact use case, it
> might be better to pay that cost up-front, instead of deferring it to
> when the user wants to extract an element, which happens with the
> compressed list. As long as the number of samples is small, the memory
> overhead should be minimal.
>
> Michael
>
> On Wed, Feb 25, 2015 at 9:59 AM, Michael Lawrence <michafla at gene.com
> <mailto:michafla at gene.com>> wrote:
>
> Yea, I know, just need to get around to that one. Technically, it
> works, but it's obviously not ideal.
>
> On Wed, Feb 25, 2015 at 8:52 AM, Gabe Becker <becker.gabe at gene.com
> <mailto:becker.gabe at gene.com>> wrote:
>
> Why does splitting a VRanges give a GRangesList with VRanges
> objects as elements? Seems like it should return a VRangesList.
>
> > spl = split(vr, sampleNames(vr))
> > class(spl)
> [1] "GRangesList"
> attr(,"package")
> [1] "GenomicRanges"
> > class(spl[[1]])
> [1] "VRanges"
> attr(,"package")
> [1] "VariantAnnotation"
>
>
> ~G
>
> On Wed, Feb 25, 2015 at 8:39 AM, Michael Lawrence
> <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
> wrote:
>
> Construction will take longer; the savings are in the
> accessing of the
> elements. But this seems like too much longer, so I will
> look into it.
>
> On Wed, Feb 25, 2015 at 8:12 AM, Robert Castelo
> <robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>>
> wrote:
>
> > my current reason to prefer a CompressedVRangesList
> object over a
> > SimpleVRangesList object is that i find one order of
> magnitude difference
> > in creation time in each of these classes of objects:
> >
> > library(VariantAnnotation)
> >
> > fl <- system.file("extdata", "CEUtrio.vcf.bgz",
> > package="VariantFiltering")
> >
> > vcf <- readVcf(fl, genome="hg19")
> > vr <- as(vcf, "VRanges")
> > length(vr)
> > [1] 15000
> >
> > ## create a VRangesList object
> > system.time(vrl <- do.call("VRangesList", split(vr,
> sampleNames(vr))))
> > user system elapsed
> > 0.247 0.004 0.252
> >
> > ## create a CompressedVRangesList object
> > system.time(cvrl <- new("CompressedVRangesList", split(vr,
> > sampleNames(vr))))
> > user system elapsed
> > 0.019 0.000 0.019
> >
> > 0.252/0.019
> > [1] 13.26316
> >
> > with a larger vcf differences increase:
> >
> > [... load vcf, coerce to VRanges ...]
> > length(vr)
> > [1] 25916
> >
> > system.time(vrl <- do.call("VRangesList", split(vr,
> sampleNames(vr))))
> > user system elapsed
> > 2.672 0.000 2.676
> >
> > system.time(cvrl <- new("CompressedVRangesList", split(vr,
> > sampleNames(vr))))
> > user system elapsed
> > 0.014 0.000 0.014
> >
> > 2.676 / 0.014
> > [1] 191.1429
> >
> >
> > so maybe i'm using the wrong way to construct a
> VRangesList object, but
> > according to our last conversation about this, there was
> no obvious default
> > fast way to do it, starting from a VRanges object:
> >
> >
> https://stat.ethz.ch/pipermail/bioc-devel/2015-January/006905.html
> >
> > it would be great if there's a fast way to do this kind
> of construction.
> >
> > thanks,
> >
> > robert.
> >
> > On 02/25/2015 04:42 PM, Michael Lawrence wrote:
> >
> >> If you're storing data on a relatively small number of
> individuals (say,
> >> hundreds), you should use SimpleVRangesList, not
> CompressedVRangesList.
> >>
> >> On Wed, Feb 25, 2015 at 7:10 AM, Robert Castelo
> <robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>
> >> <mailto:robert.castelo at upf.edu
> <mailto:robert.castelo at upf.edu>>> wrote:
> >>
> >> i see you point, the logic i was thinking about is
> to use a list of
> >> VRanges objects to hold separately the variants of
> multiple
> >> individuals, with one VRanges object per individual.
> >>
> >> if i type the name of such a list object on the R
> shell, having the
> >> GRangesList show method, i feel i do not see much
> information
> >> because the screen just scrolls up tens or hundreds
> of lines
> >> specifiying variants per individual. however, the
> concise appearance
> >> of something like a VRangesList:
> >>
> >> > vrl
> >> VRangesList of length 10
> >> names(32): S1 S2 S3 S4 ... S7 S8 S9 S10
> >>
> >> at least suggests the user that the object holding
> the variants has
> >> information for 10 samples and belongs to the class
> 'VRangesList'.
> >>
> >> i thought this made general sense but i'm fine if
> you feel this
> >> interpretation does not warrant such a change.
> >>
> >> cheers,
> >>
> >> robert.
> >>
> >> On 02/25/2015 01:25 AM, Michael Lawrence wrote:
> >>
> >> Why not have the SimpleVRangesList be shown like
> >> CompressedVRangesList,
> >> for consistency with GRangesList? In other
> words, the opposite
> >> of what
> >> you propose. A strong argument could also be
> made that a
> >> SimpleGenomicRangesList should be shown like a
> GRangesList.
> >> Unless there
> >> is some aversion to the more verbose output....
> >>
> >> On Tue, Feb 24, 2015 at 2:36 PM, Robert Castelo
> >> <robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>
> <mailto:robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>>
> >> <mailto:robert.castelo at upf.edu
> <mailto:robert.castelo at upf.edu>
> >>
> >> <mailto:robert.castelo at upf.edu
> <mailto:robert.castelo at upf.edu>>__>> wrote:
> >>
> >> so, yes, but IMO rather than inheriting the
> show method from
> >> a
> >> GRangesList, i think that the show method for
> >> CompressedVRangesList
> >> objects should be inherited from a
> VRangesList object.
> >> right now
> >> this is the situation:
> >>
> >> library(VariantAnnotation)
> >>
> >> example(VRangesList)
> >> vrl
> >> VRangesList of length 2
> >> names(2): sampleA sampleB
> >>
> >> cvrl <- new("CompressedVRangesList", split(vr,
> >> sampleNames(vr)))
> >> cvrl
> >> CompressedVRangesList object of length 2:
> >> $a
> >> VRanges object with 1 range and 1 metadata
> column:
> >> seqnames ranges strand
> ref alt
> >> totalDepth refDepth altDepth
> >> <Rle> <IRanges> <Rle> <character> <characterOrRle>
> <integerOrRle>
> >> <integerOrRle> <integerOrRle>
> >> [1] chr1 [1, 5] + T
> >> C 12 5 7
> >> sampleNames softFilterMatrix |
> tumorSpecific
> >> <factorOrRle> <matrix> | <logical>
> >> [1] a TRUE |
> FALSE
> >>
> >> $b
> >> VRanges object with 1 range and 1 metadata
> column:
> >> seqnames ranges strand ref alt
> totalDepth refDepth
> >> altDepth
> >> sampleNames softFilterMatrix |
> >> [1] chr2 [10, 20] + A T
> 17 10
> >> 6 b FALSE |
> >> tumorSpecific
> >> [1] TRUE
> >>
> >> -------
> >> seqinfo: 2 sequences from an unspecified
> genome; no
> >> seqlengths
> >>
> >> would it be possible to have the
> VRangesList show method for
> >> CompressedVRangesList objects?
> >>
> >> robert.
> >>
> >>
> >>
> >> On 2/24/15 7:24 PM, Michael Lawrence wrote:
> >>
> >> I think you might be missing an import.
> It should
> >> inherit the
> >> method for GRangesList.
> >>
> >> On Tue, Feb 24, 2015 at 9:53 AM, Robert
> Castelo
> >> <robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>
> <mailto:robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>>
> >> <mailto:robert.castelo at upf.edu
> <mailto:robert.castelo at upf.edu>
> >> <mailto:robert.castelo at upf.edu
> <mailto:robert.castelo at upf.edu>>__>> wrote:
> >>
> >> hi,
> >>
> >> i'm using the CompressedVRangesList
> class in
> >> VariantFiltering
> >> to hold variants and their
> annotations across
> >> multiple samples
> >> and found that there was no show
> method for this
> >> class (unless
> >> i'm missing the right import here)
> so i made one
> >> within
> >> VariantFiltering by copying&pasting
> from other
> >> similar classes:
> >>
> >> setMethod("show",
> >> signature(object="__CompressedVRangesList"),
> >> function(object) {
> >> lo <- length(object)
> >>
> cat(classNameForDisplay(__object), " of
> >> length ",
> >> lo, "\n",
> >> sep = "")
> >> if
> (!is.null(names(object)))
> >> cat(BiocGenerics:::__
> >> labeledLine("names",
> >> names(object)))
> >> })
> >>
> >> i guess, however, that the right
> home for this would
> >> be
> >> VariantAnnotation. let me know if
> you consider
> >> adding it there
> >> (or somewhere else) and i'll remove
> it from
> >> VariantFiltering.
> >>
> >> thanks,
> >>
> >> robert.
> >>
> >>
> _________________________________________________
> >> Bioc-devel at r-project.org
> <mailto:Bioc-devel at r-project.org>
> <mailto:Bioc-devel at r-project.org
> <mailto:Bioc-devel at r-project.org>>
> >> <mailto:Bioc-devel at r-project.
> <mailto:Bioc-devel at r-project.>__org
> >> <mailto:Bioc-devel at r-project.org
> <mailto:Bioc-devel at r-project.org>>>
> >> mailing list
> >> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Robert Castelo, PhD
> >> Associate Professor
> >> Dept. of Experimental and Health Sciences
> >> Universitat Pompeu Fabra (UPF)
> >> Barcelona Biomedical Research Park (PRBB)
> >> Dr Aiguader 88
> >> E-08003 Barcelona, Spain
> >> telf: +34.933.160.514 <tel:%2B34.933.160.514>
> <tel:%2B34.933.160.514>
> >> fax: +34.933.160.550 <tel:%2B34.933.160.550>
> <tel:%2B34.933.160.550>
> >>
> >>
> >>
> > --
> > Robert Castelo, PhD
> > Associate Professor
> > Dept. of Experimental and Health Sciences
> > Universitat Pompeu Fabra (UPF)
> > Barcelona Biomedical Research Park (PRBB)
> > Dr Aiguader 88
> > E-08003 Barcelona, Spain
> > telf: +34.933.160.514 <tel:%2B34.933.160.514>
> > fax: +34.933.160.550 <tel:%2B34.933.160.550>
> >
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
> --
> Gabriel Becker, Ph.D
> Computational Biologist
> Genentech Research
>
>
>
--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550
More information about the Bioc-devel
mailing list