[BioC] IRanges: cbind not well defined for RangedData?

Michael Dondrup Michael.Dondrup at uni.no
Fri Mar 19 15:23:03 CET 2010


Dear Patrick and Michael,

thank you very much for your helpful support on my last two connected issued! It is somehow in
the documentation in the examples but I must have overlooked it.

I tried it out immediately, and it works fine:

> rd = RangedData(IRanges(start=1:4, width=10, names=paste("a",1:4)), space=1:2 )
> rd
> rd$a.value = rnorm(4)
> rd
RangedData with 4 rows and 1 value column across 2 spaces
        space    ranges |    a.value
  <character> <IRanges> |  <numeric>
1           1   [1, 10] | -0.6765515
2           1   [3, 12] |  1.5406962
3           2   [2, 11] | -1.2599696
4           2   [4, 13] |  0.4971178

But then I had to reboot my computer because by accident tried this on a 100,000 ranges
and the value was actually a list, not a vector, and then the re-cycling rule struck me:

> rd$a.list = as.list(1:4)
first everything seems fine and normal but if you try to print it:
> rd
RangedData with 4 rows and 1 value column across 2 spaces
Error in .Method(..., deparse.level = deparse.level) : 
  number of rows of matrices must match (see arg 2)
or try to convert into a data.frame:
> as.data.frame(rd)
  space start end width names a.list.1L a.list.2L a.list.3L a.list.4L
1     1     1  10    10   a 1         1         2         3         4
2     1     3  12    10   a 3         1         2         3         4
3     2     2  11    10   a 2         1         2         3         4
4     2     4  13    10   a 4         1         2         3         4

as I tried this, I R ran into some memory problems. 

This just as a warning,  to make sure you really use a vector here. Maybe something to put in the
type checking, or documentation?

Anyway, thanks a lot again
Michael


Am Mar 18, 2010 um 6:55 PM schrieb Patrick Aboyoun:

> I have been experimenting with S4 dispatch on ... (optional arguments) and reading the man page for dotMethods
> 
> > help(dotsMethods)
> 
> Long story short, adding support for cbind-ing a vector to an S4 object would probably involve either
> 
> 1) creating an S4 class union of an S4 class (e.g. RangedData) with vector so the existing S4 dispatch would choose the correct method or
> 2) creating an S4 default method for cbind that has it own dispatch mechanism for choosing a cbind method.
> 
> I don't find either of these options appealing and second Michael Lawrence's suggestion of using "$<-" or "[[<-" to bind new columns to a RangedData object.
> 
> > a.value <- rnorm(4)
> > rd1 <- RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), width=runif(4, min=1, max=10E5), names=paste("bla",1:4)), space=1:2)
> > obj <- cbind(rd1, a.value)
> > showMethods("cbind")
> Function: cbind (package IRanges)
> ...="ANY"
> ...="DataFrame"
> ...="DataFrameList"
> ...="DataTable"
> ...="numeric#RangedData"
>    (inherited from: ...="ANY")
> 
> > df1 <- unlist(values(rd1))
> > class(df1)
> [1] "DataFrame"
> attr(,"package")
> [1] "IRanges"
> > cbind(df1, a.value)
>     df1 a.value
> [1,] ?   -0.6268173
> [2,] ?   2.540871
> [3,] ?   0.4137926
> [4,] ?   -0.897856
> > showMethods("cbind")
> Function: cbind (package IRanges)
> ...="ANY"
> ...="DataFrame"
> ...="DataFrame#numeric"
>    (inherited from: ...="ANY")
> ...="DataFrameList"
> ...="DataTable"
> ...="numeric#RangedData"
>    (inherited from: ...="ANY")
> 
> > sessionInfo()
> R version 2.11.0 Under development (unstable) (2010-03-14 r51276)
> i386-apple-darwin9.8.0
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] IRanges_1.5.64
> 
> 
> On 3/18/10 10:32 AM, Michael Lawrence wrote:
>> On Thu, Mar 18, 2010 at 7:55 AM, Michael Dondrup<Michael.Dondrup at uni.no>wrote:
>> 
>>   
>>> Hi,
>>> here is another little possible glitch with RangedData and cbind(),
>>> actually would like to propose to
>>> change or expand the behavior of the cbind function or to add to it's
>>> documentation. The use-case is as
>>> follows:
>>> Assume we have some chromosomal Ranges in a RangedData object. Then we can
>>> iteratively compute statistics  on
>>> these ranges and attach them to the DataFrame holding extra data, e.g. some
>>> count data or combine qualitiy scores possibly from multiple conditions.
>>> 
>>> So according to the documentation of the RangedData-class,
>>>     
>>>> The first mode treats the object as a contiguous "data frame" annotated
>>>>       
>>> with range information.
>>>     
>>>> The accessors start, end, and width get the corresponding fields in the
>>>>       
>>> ranges as atomic integer vectors, undoing
>>>     
>>>> the division over the spaces. The [[>  and matrix-style [, extraction and
>>>>       
>>> subsetting functions unroll the data in the same way. [[<- does the inverse.
>>> I assume I could use cbind(rd, a.value) to attach the statistics to the
>>> internal data representation. So would it be possible to
>>> make cbind return something more useful, or are there better ways to do it?
>>> 
>>> 
>>> 
>>>     
>> Right now it's just using the cbind method for "ANY", because one does not
>> exist for RangedData. To be honest, I've always just used the $<- syntax for
>> adding the statistics. This seems like it would work well in your use case,
>> as well.
>> 
>> Like:
>> 
>> rd$a.value<- a.value
>> 
>> Michael
>> 
>> 
>> 
>>   
>>> Best
>>> Michael
>>> 
>>> 
>>> Example:
>>> 
>>>     
>>>> a.value = rnorm(4)
>>>> rd1 = RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8),
>>>>       
>>> width=runif(4, min=1, max=10E5), names=paste("bla",1:4)), space=1:2)
>>>     
>>>> rd1
>>>>       
>>> RangedData with 4 rows and 0 value columns across 2 spaces
>>>            space                 ranges |
>>>      <character>               <IRanges>  |
>>> bla 1           1 [773679042, 774010137] |
>>> bla 3           1 [194819013, 195136171] |
>>> bla 2           2 [183105318, 183509803] |
>>> bla 4           2 [107730452, 107823748] |
>>> 
>>>     
>>>>  obj = cbind(rd1, a.value)
>>>>       
>>> And I would intuitively assume the result to look exactly like this:
>>> 
>>>     
>>>> RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), width=runif(4,
>>>>       
>>> min=1, max=10E5), names=paste("bla",1:4)), space=1:2, a.value)
>>> RangedData with 4 rows and 1 value column across 2 spaces
>>>            space                 ranges |    a.value
>>>      <character>               <IRanges>  |<numeric>
>>> bla 1           1 [473042533, 473820859] | -1.7956588
>>> bla 3           1 [ 75991383,  76022516] |  0.3588571
>>> bla 2           2 [475385363, 476224756] |  1.4166218
>>> bla 4           2 [532603052, 532902678] |  0.2324424
>>> 
>>> But what I get is much different:
>>> 
>>>     
>>>> class(obj)
>>>>       
>>> [1] "matrix"
>>>     
>>>> typeof(obj)
>>>>       
>>> [1] "list"
>>> 
>>>     
>>>> obj
>>>>       
>>>     rd1 a.value
>>> [1,] ?   0.3255676
>>> [2,] ?   0.5913471
>>> [3,] ?   0.9317755
>>> [4,] ?   -0.8897527
>>> 
>>>     
>>>> sessionInfo()
>>>>       
>>> R version 2.10.1 (2009-12-14)
>>> x86_64-apple-darwin9.8.0
>>> 
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>> 
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> 
>>> other attached packages:
>>> [1] IRanges_1.4.9
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.10.1
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> 
>>>     
>> 	[[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>   
> 



More information about the Bioconductor mailing list