[Bioc-sig-seq] aggregating a RangedData over an IRanges object

Simon Anders anders at ebi.ac.uk
Tue Jul 14 00:20:30 CEST 2009


Hi Michael

Michael Lawrence wrote:
>> 1. How can I get the vector referring to chr10 in cvg.bl1? Is there a way
>> to coerce to Rle or numeric? Supposedly, this only make sense if I subset
>> the RangedData object to only contain one chromosome.
>>
> 
> score(cvg.bl1)
> or for any arbitrary column:
> cvg.bl1$score

Sorry, this is not at all what I meant, but this misunderstanding
highlights the general issue I see with Rle and RangedData. You see, for
me, there is on piece of data, a very long vector, and different ways to
store it in memory. Let's say, my vector is

  v1 <- c( 2, 2, 2, 7, 7, 3, 3, 3, 3 )

I could store this either as an ordinary vector, as above, or as an Rle

  v2 <- Rle( c( 2, 7, 3 ), c( 3, 2, 4 ) )

or as a RangedData object

  v3 <- RangedData(
           IRanges( start=c( 1, 4, 6 ), width=c( 3, 2, 4 ) ),
           score = c( 2, 7, 3 ) )

I can get from v2 to v1 with "as.vector"

   > as.vector( v2 )
   [1] 2 2 2 7 7 3 3 3 3

   > all( as.vector( v2 ) == v1 )
   [1] TRUE

But it escapes me how I am supposed to get from v3 to v1. This is not
what I want:

   > score(v3)
   [1] 2 7 3

This here is overly complicated and works only because there are no gaps
between the intervals:

   > rep( score(v3), each=width(v3) )
   [1] 2 2 2 7 7 7 3 3 3


>> 2. How can I aggregate over the RangedData object? As it contains only one
>> chromosome, it should be possible to coerce it to an Rle vector, but there
>> is no coercion method, and, it should be possible anyway to avoid this.
>>
> 
> Just use Views.
> Like:
> v <- Views(score(cvg.bl1), ranges(exons)[[1]])
> viewSums(v)

Actually, it does not work:

> v <- Views(score(cvg.bl1), ranges(exons)[[1]])
Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function "Views",
  for signature "numeric"

The reason is that 'score(cvg.bl1)' is just the numeric vector of
values, and the information of the widths of the runs is lost here. So,
even if there were a suitable method, it would not do what I want.

In order to go on with my previous variables, let's define two index ranges:

  > r1 <- list( 2:3, 7:8 )

With IRanges, I would rather write

  > r2 <- IRanges( start=c(2,7), width=c(2,2) )

to denote the same.

So, what I want is the sum of the indexed entries. With standard R
vectors, I write

  > sapply( r1, function(i) sum( v1[i] ) )
  [1] 4 6

With IRanges and Rle, I do

  > aggregate( v2, r2, sum )
  [1] 4 6

to get the same. But what should I do if I have the RangedData v3
instead of the Rle v2? The aggregate method does not support RangedData:

   > aggregate( v3, r2, sum )
   Error in FUN(window(x, start = start[i], end = end[i]), ...) :
     invalid 'type' (S4) of argument

One might argue that aggregate should not support it, because a
GenomeData object is a list of vectors, and an IRanges is not. Then, it
would be nice if this here worked:

   > r3 <- IRangesList( score = r2 )
   > aggregate( v3, r3, sum )


What I mean is that aggregate on the signature c( "RangedData",
"IRangesList", "FUN" ) could be defined to go through the named (or
indexed) elements of both its first and seond argument, aggregate these
and return, in this case, list( score=c(4,6) ), and in general a list
with one entry per field that both arguments share.

Note that your suggestion with Views works for IRanges and Rle:

   > Views( v2, r2 )
     Views on a 9-length Rle subject

   views:
       start end width
   [1]     2   3     2 [2 2]
   [2]     7   8     2 [3 3]
   > viewSums( Views( v2, r2 ) )
   [1] 4 6

but again, it cannot handle RangedData objects:

   > Views( v3, r2 )
   Error in function (classes, fdef, mtable)  :
     unable to find an inherited method for function "Views", for
     signature "RangedData"

   > Views( v3, r3 )
   Error in function (classes, fdef, mtable)  :
     unable to find an inherited method for function "Views", for
     signature "RangedData"



Either, I am missing something, or RangedData is missing something.


Cheers
  Simon



More information about the Bioc-sig-sequencing mailing list