[BioC] views on Rle using GRanges object

Janet Young jayoung at fhcrc.org
Tue Aug 16 02:25:19 CEST 2011

Hi again,

I have another question, in the "I think I need a convoluted workaround, but maybe I'm missing a simple solution" genre (seems like most of my questions are like that).

I have an RleList object representing mapability for the whole human genome. I'd like to:
(a) calculate viewMeans for various regions of interest that I've been representing as GRanges, and 
(b) I'd like the underlying code to be smart and match chromosome names up in the RleList and the GRanges object (not rely on chromosomes being ordered the same in the two objects), and 
(c) I'd like to return the viewMeans results in the same order as the objects in my original GRanges

I don't think this is currently implemented without doing several coercions (that introduce their own complications) but I'm not sure.  Some code is below that shows what I'm trying to do.   It seems like it might be a common kind of way to use viewMeans - I imagine people are gradually switching to use GRanges more and more? but really I don't know.  

My real world question is that I've read in mapability from a UCSC bigWig file, and made that into an RleList.  I have a bunch of other regions I've read in (again from UCSC) using rtracklayer as GRanges, and have annotated those with various things I'm interested in (e.g. number of RNAseq reads in the region).  I want to add the average mapability for each region, so that I can later look at how mapability affects those other things I'm annotating.

Should I be being more strict with myself about how my GRanges are ordered and making sure they all have the same seqlevels and seqlengths?  Perhaps that would help the coercion workarounds go more smoothly.




### make an RleList. Just for this example, I starting with a GRanges object and used coverage to get an RleList example. To get my real data I read in a UCSC bigWig file.

fakeRegions <- GRanges(seqnames=c("chrA","chrA", "chrB", "chrC"), ranges=IRanges(start=c(1,51,1,1), end=c(60,90,20,10) ) ) 
seqlengths(fakeRegions) <- c(100,100,100)
myRleList <- coverage(fakeRegions)


# SimpleRleList of length 3
# $chrA
# 'integer' Rle of length 100 with 4 runs
#   Lengths: 50 10 30 10
#   Values :  1  2  1  0
# $chrB
# 'integer' Rle of length 100 with 2 runs
#   Lengths: 20 80
#   Values :  1  0
# $chrC
# 'integer' Rle of length 100 with 2 runs
#   Lengths: 10 90
#   Values :  1  0

### make some regions of interest
myRegions <- GRanges(seqnames=c("chrB", "chrC", "chrB"), ranges=IRanges(start=c(1,1,5), end=c(20,20,10) ), geneNames=c("g1","g2","g3") ) 
# GRanges with 3 ranges and 1 elementMetadata value
#     seqnames    ranges strand |   geneNames
#        <Rle> <IRanges>  <Rle> | <character>
# [1]     chrB   [1, 20]      * |          g1
# [2]     chrC   [1, 20]      * |          g2
# [3]     chrB   [5, 10]      * |          g3
# seqlengths
#  chrB chrC
#    NA   NA

## can't use GRanges directly
Views( myRleList, myRegions) 
# Error in RleViewsList(rleList = subject, rangesList = start) : 
#   'rangesList' must be a RangesList object

## can't use a simple coercion
Views( myRleList, as(myRegions,"RangesList") ) 
# Error in .Method(..., FUN = FUN, MoreArgs = MoreArgs, SIMPLIFY = SIMPLIFY,  : 
#   all object lengths must be multiple of longest object length

### although I can use that coercion if I first give the GRanges object the same levels as the RleList to force the lists to have the same names as each other:
seqlevels(myRegions) <- names(myRleList)
viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ) 
# SimpleNumericList of length 3
# [["chrA"]] numeric(0)
# [["chrB"]] 1 1
# [["chrC"]] 0.5

### getting close to a solution - but I'd have liked to have been able to unlist this and directly add to my GRanges object e.g. 
values(myRegions)$meanCov <- unlist(viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ), use.names=FALSE)
### but the regions are in a different order to how I started, so the above command would associate the wrong scores with the wrong regions.


R version 2.13.1 (2011-07-08)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicRanges_1.4.6 IRanges_1.10.5     

More information about the Bioconductor mailing list