[BioC] subset a RleList using GRanges object?
Cook, Malcolm
MEC at stowers.org
Wed Dec 4 20:43:44 CET 2013
Janet, et al
I believe there are problems with all approaches mentioned so far on this thread.
When you index something, length of the result should equal the length of the index, AND, the order of the elements of the result should correspond to the order of the indices.
The approaches mentioned all depend upon coersion from GRanges to RangesList, but this does not preserve the order of the GRanges.
Look:
> example(GRangesList)
> as(gr3,'RangesList')
IRangesList of length 2
$chr1
IRanges of length 1
start end width
[1] 1 3 3
$chr2
IRanges of length 1
start end width
[1] 4 9 6
!> as(gr3[2:1],'RangesList')
IRangesList of length 2
$chr1
IRanges of length 1
start end width
[1] 1 3 3
$chr2
IRanges of length 1
start end width
[1] 4 9 6
I think if you contrive for your GRanges to be in 'canonical order' (as defined by the RangesList coersion), you can proceed with this approach. Look:
> stopifnot(all.equal(as(sort(gr3[2:1]),'RangesList'),as(sort(gr3[1:2]),'RangesList')))
Janet, you brought this up in
http://thread.gmane.org/gmane.science.biology.informatics.conductor/36247/focus=36263
with more or less the same outcome.
Here is excerpt from that exchange:
>> An "order" method would be great. Note though that the coercion to
>> RangesList only sorts by seqlevels, so we would need something that gave
>> out that order vector. The point here is to avoid forcing the user to
>> modify the order of the original GRanges.
>>
>
> With order() the user will still be able to achieve this with something
> like:
>
> regionMeans <- function(regions, cvg)
> {
> seqlevels(regions) <- names(cvg)
> oo <- order(regions)
> regions <- regions[oo]
> ans <- unlist(
> viewMeans(
> Views(cvg, as(regions, "RangesList"))
> ), use.names=FALSE)
> ans[oo] <- ans # restore original order
> ans
> }
>
> values(myRegions)$meanCov <- regionMeans(myRegions, myRleList)
>
> Tricky :-/
Let's not forget this lesson!
However, I agree, some syntactic sugar that allows indexing an RleList by GRanges with the proposed semantics would be great.
In the mean time, I offer this function, which I found to be faster than Herve's offering above, and also provides a few more useful (to me) abstractions.
YMMV:
applyAt <- function(cvg # an RleList (quite possibly representing a genomic coverage)
,gr # a GRanges at each element of which you want to evaluate a function
,FUN=identity # by default, just return individual rleLists for each element of gr
,... # additional options to FUN
,.MAPPLY=mapply #optionally parallize with mcmapply or your favorite variant
,USE.NAMES=TRUE
,SIMPLIFY=TRUE #'array'#TRUE
,MoreArgs=NULL # passed to .MAPPLY
,ignore.strand=TRUE # else, reverse the coverages
) {
revif<-function(s,x) {
if(ignore.strand==TRUE) {x}
else if (s=='+') {x}
else {rev(x)}
}
ans<-.MAPPLY(function(a,b,c,s) {
FUN(revif(s,cvg[[a]][IRanges(b,c),drop=FALSE]),...)
}
,as.vector(seqnames(gr)),start(gr),end(gr),as.vector(strand(gr))
,MoreArgs=MoreArgs
,SIMPLIFY=SIMPLIFY
,USE.NAMES=FALSE #USE.NAMES
)
if (!USE.NAMES) {}
else if (is.null(dim(ans))) {names(ans)<-names(gr)}
else if (SIMPLIFY=='array') {colnames(ans)<-names(gr)}
else rownames(ans)<-names(gr)
ans
}
Finally, what I think we REALLY want is a version of the above which currys a FUN (possibly `identity`) down to rle levels that can be applied to a bigwig file without having to load the whole thing into memory.
In the mean time, I offer:
applyAt.bw<-function(x,y,...) {
applyAt(import.bw(x,asRle=TRUE
,selection=BigWigSelection(y)
)
,y
,...)
}
~Malcolm
>-----Original Message-----
>From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Janet Young
>Sent: Wednesday, December 04, 2013 12:59 PM
>To: Hervé Pagès
>Cc: Michael Lawrence; bioconductor at r-project.org
>Subject: Re: [BioC] subset a RleList using GRanges object?
>
>Thanks, all. Yes, that coercion will do it for me: z[as(myRange, "RangesList")]. It's always nice for the end user when you take care of
>those kinds of things behind the scenes for us - sounds like what you're considering implementing might take care of it.
>
>Robert: thanks for the ideas! But my needs are a little different, though - I'm keeping the individual values in the region I'm interested
>in (I'm plotting RNA-seq coverage on just a single gene, but my coverage object is for the whole genome).
>
>Janet
>
>
>
>On Dec 4, 2013, at 10:09 AM, Hervé Pagès wrote:
>
>> Hi Michael,
>>
>> On 12/03/2013 07:13 PM, Michael Lawrence wrote:
>>> For now, just coerce to a RangesList.
>>>
>>> z [ as(myRange, "RangesList") ]
>>>
>>> Herve might consider adding a [,List,GenomicRanges method... kind of a
>>> logical leap though from seqnames to list element names.
>>
>> This is something I've been considering for a while. I think we should
>> do it. The mapping between the seqnames of a GRanges and the names of a
>> RangesList is well established at this point e.g. the coercion (back and
>> forth) between the two already does that.
>>
>> Cheers,
>> H.
>>
>>>
>>> extractCoverageForPositions is different; it extracts an integer vector
>>> given a vector of width-one positions.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Dec 3, 2013 at 4:53 PM, Janet Young <jayoung at fhcrc.org> wrote:
>>>
>>>> Hi there,
>>>>
>>>> I'm playing around with coverage data generated outside of R, planning to
>>>> plot RNA-seq coverage for some genes we're interested in.
>>>>
>>>> I have a request - it'd be nice from my point of view if it were possible
>>>> to look at a small region an RleList (or CompressedRleList) using a GRanges
>>>> object to focus on that smaller region. Looks like I can subset a single
>>>> Rle object with an IRanges object, but I wonder if this nice feature could
>>>> be extended to GRanges objects?
>>>>
>>>> Some code is below to show what I'm trying to do.
>>>>
>>>> thanks veruy much,
>>>>
>>>> Janet
>>>>
>>>>
>>>>
>>>> library(GenomicRanges)
>>>>
>>>> ## an example RleList
>>>> x <- Rle(10:1, 1:10)
>>>> y <- Rle(10:1, 1:10)
>>>> z <- RleList( chr1=x, chr2=y)
>>>>
>>>> ## an example GRanges object
>>>> myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) )
>>>>
>>>> ## subsetting an Rle using an IRanges object works, as expected:
>>>> z[["chr1"]] [ ranges(myRange) ]
>>>>
>>>> ## but subsetting an RleList by GRanges object doesn't work
>>>> z [ myRange ]
>>>> # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript type
>>>>
>>>> sessionInfo()
>>>>
>>>> R Under development (unstable) (2013-11-06 r64163)
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] parallel stats graphics grDevices utils datasets methods
>>>> [8] base
>>>>
>>>> other attached packages:
>>>> [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13
>>>> [4] BiocGenerics_0.9.1
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] stats4_3.1.0
>>>>
>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>>
>>>> Dr. Janet Young
>>>>
>>>> Malik lab
>>>> http://research.fhcrc.org/malik/en.html
>>>>
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Avenue N., A2-025,
>>>> P.O. Box 19024, Seattle, WA 98109-1024, USA.
>>>>
>>>> tel: (206) 667 4512
>>>> email: jayoung ...at... fhcrc.org
>>>>
>>>> -------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list