[Bioc-sig-seq] aggregating a RangedData over an IRanges object

Patrick Aboyoun paboyoun at fhcrc.org
Tue Jul 14 05:21:15 CEST 2009


I just checked in a change to BioC 2.5 (devel) to make the coverage 
method for RangedData accept character strings as arguments. So now the 
function call looks a bit cleaner.

 > suppressMessages(library(IRanges))
 > v3 <- RangedData(IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
+                   score = c(2, 7, 3))
 > mycoverage <- coverage(v3, weight = "score")
 > mycoverage[[1]]
  'integer' Rle instance of length 9 with 3 runs
  Lengths:  3 2 4
  Values :  2 7 3

This change should be pushed to bioconductor.org on the July 14th build 
of BioC 2.5 (devel).


Patrick



Patrick Aboyoun wrote:
> Simon,
> There is a coverage function for RangedData objects you can use. From 
> there you can calculate the exon AUCs using the viewSums function.
>
>
> > suppressMessages(library(IRanges))
> > v3 <- RangedData(IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
> +                  score = c(2, 7, 3))
> > mycoverage <- coverage(v3, weight = lapply(values(v3), "[[", "score"))
> > mycoverage
> SimpleRleList: 1 element
> > mycoverage[[1]]
>  'integer' Rle instance of length 9 with 3 runs
>  Lengths:  3 2 4
>  Values :  2 7 3
> > myexons <- IRangesList(IRanges(start = c(2, 6), end = c(4, 8)))
> > coverageAUC <- lapply(seq_len(length(mycoverage)),
> +     function(i) viewSums(Views(mycoverage[[i]], myexons[[i]])))
> > coverageAUC
> [[1]]
> [1] 11  9
> > sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-06-28 r48863)
> i386-apple-darwin9.7.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base   
> other attached packages:
> [1] IRanges_1.3.34
>
>
>
> Patrick
>
>
> Simon Anders wrote:
>> Hi Michael
>>
>> Michael Lawrence wrote:
>>  
>>>> 1. How can I get the vector referring to chr10 in cvg.bl1? Is there 
>>>> a way
>>>> to coerce to Rle or numeric? Supposedly, this only make sense if I 
>>>> subset
>>>> the RangedData object to only contain one chromosome.
>>>>
>>>>       
>>> score(cvg.bl1)
>>> or for any arbitrary column:
>>> cvg.bl1$score
>>>     
>>
>> Sorry, this is not at all what I meant, but this misunderstanding
>> highlights the general issue I see with Rle and RangedData. You see, for
>> me, there is on piece of data, a very long vector, and different ways to
>> store it in memory. Let's say, my vector is
>>
>>   v1 <- c( 2, 2, 2, 7, 7, 3, 3, 3, 3 )
>>
>> I could store this either as an ordinary vector, as above, or as an Rle
>>
>>   v2 <- Rle( c( 2, 7, 3 ), c( 3, 2, 4 ) )
>>
>> or as a RangedData object
>>
>>   v3 <- RangedData(
>>            IRanges( start=c( 1, 4, 6 ), width=c( 3, 2, 4 ) ),
>>            score = c( 2, 7, 3 ) )
>>
>> I can get from v2 to v1 with "as.vector"
>>
>>    > as.vector( v2 )
>>    [1] 2 2 2 7 7 3 3 3 3
>>
>>    > all( as.vector( v2 ) == v1 )
>>    [1] TRUE
>>
>> But it escapes me how I am supposed to get from v3 to v1. This is not
>> what I want:
>>
>>    > score(v3)
>>    [1] 2 7 3
>>
>> This here is overly complicated and works only because there are no gaps
>> between the intervals:
>>
>>    > rep( score(v3), each=width(v3) )
>>    [1] 2 2 2 7 7 7 3 3 3
>>
>>
>>  
>>>> 2. How can I aggregate over the RangedData object? As it contains 
>>>> only one
>>>> chromosome, it should be possible to coerce it to an Rle vector, 
>>>> but there
>>>> is no coercion method, and, it should be possible anyway to avoid 
>>>> this.
>>>>
>>>>       
>>> Just use Views.
>>> Like:
>>> v <- Views(score(cvg.bl1), ranges(exons)[[1]])
>>> viewSums(v)
>>>     
>>
>> Actually, it does not work:
>>
>>  
>>> v <- Views(score(cvg.bl1), ranges(exons)[[1]])
>>>     
>> Error in function (classes, fdef, mtable)  :
>>   unable to find an inherited method for function "Views",
>>   for signature "numeric"
>>
>> The reason is that 'score(cvg.bl1)' is just the numeric vector of
>> values, and the information of the widths of the runs is lost here. So,
>> even if there were a suitable method, it would not do what I want.
>>
>> In order to go on with my previous variables, let's define two index 
>> ranges:
>>
>>   > r1 <- list( 2:3, 7:8 )
>>
>> With IRanges, I would rather write
>>
>>   > r2 <- IRanges( start=c(2,7), width=c(2,2) )
>>
>> to denote the same.
>>
>> So, what I want is the sum of the indexed entries. With standard R
>> vectors, I write
>>
>>   > sapply( r1, function(i) sum( v1[i] ) )
>>   [1] 4 6
>>
>> With IRanges and Rle, I do
>>
>>   > aggregate( v2, r2, sum )
>>   [1] 4 6
>>
>> to get the same. But what should I do if I have the RangedData v3
>> instead of the Rle v2? The aggregate method does not support RangedData:
>>
>>    > aggregate( v3, r2, sum )
>>    Error in FUN(window(x, start = start[i], end = end[i]), ...) :
>>      invalid 'type' (S4) of argument
>>
>> One might argue that aggregate should not support it, because a
>> GenomeData object is a list of vectors, and an IRanges is not. Then, it
>> would be nice if this here worked:
>>
>>    > r3 <- IRangesList( score = r2 )
>>    > aggregate( v3, r3, sum )
>>
>>
>> What I mean is that aggregate on the signature c( "RangedData",
>> "IRangesList", "FUN" ) could be defined to go through the named (or
>> indexed) elements of both its first and seond argument, aggregate these
>> and return, in this case, list( score=c(4,6) ), and in general a list
>> with one entry per field that both arguments share.
>>
>> Note that your suggestion with Views works for IRanges and Rle:
>>
>>    > Views( v2, r2 )
>>      Views on a 9-length Rle subject
>>
>>    views:
>>        start end width
>>    [1]     2   3     2 [2 2]
>>    [2]     7   8     2 [3 3]
>>    > viewSums( Views( v2, r2 ) )
>>    [1] 4 6
>>
>> but again, it cannot handle RangedData objects:
>>
>>    > Views( v3, r2 )
>>    Error in function (classes, fdef, mtable)  :
>>      unable to find an inherited method for function "Views", for
>>      signature "RangedData"
>>
>>    > Views( v3, r3 )
>>    Error in function (classes, fdef, mtable)  :
>>      unable to find an inherited method for function "Views", for
>>      signature "RangedData"
>>
>>
>>
>> Either, I am missing something, or RangedData is missing something.
>>
>>
>> Cheers
>>   Simon
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list