[BioC] IRanges: cbind not well defined for RangedData?
Michael Dondrup
Michael.Dondrup at uni.no
Fri Mar 19 15:23:03 CET 2010
Dear Patrick and Michael,
thank you very much for your helpful support on my last two connected issued! It is somehow in
the documentation in the examples but I must have overlooked it.
I tried it out immediately, and it works fine:
> rd = RangedData(IRanges(start=1:4, width=10, names=paste("a",1:4)), space=1:2 )
> rd
> rd$a.value = rnorm(4)
> rd
RangedData with 4 rows and 1 value column across 2 spaces
space ranges | a.value
<character> <IRanges> | <numeric>
1 1 [1, 10] | -0.6765515
2 1 [3, 12] | 1.5406962
3 2 [2, 11] | -1.2599696
4 2 [4, 13] | 0.4971178
But then I had to reboot my computer because by accident tried this on a 100,000 ranges
and the value was actually a list, not a vector, and then the re-cycling rule struck me:
> rd$a.list = as.list(1:4)
first everything seems fine and normal but if you try to print it:
> rd
RangedData with 4 rows and 1 value column across 2 spaces
Error in .Method(..., deparse.level = deparse.level) :
number of rows of matrices must match (see arg 2)
or try to convert into a data.frame:
> as.data.frame(rd)
space start end width names a.list.1L a.list.2L a.list.3L a.list.4L
1 1 1 10 10 a 1 1 2 3 4
2 1 3 12 10 a 3 1 2 3 4
3 2 2 11 10 a 2 1 2 3 4
4 2 4 13 10 a 4 1 2 3 4
as I tried this, I R ran into some memory problems.
This just as a warning, to make sure you really use a vector here. Maybe something to put in the
type checking, or documentation?
Anyway, thanks a lot again
Michael
Am Mar 18, 2010 um 6:55 PM schrieb Patrick Aboyoun:
> I have been experimenting with S4 dispatch on ... (optional arguments) and reading the man page for dotMethods
>
> > help(dotsMethods)
>
> Long story short, adding support for cbind-ing a vector to an S4 object would probably involve either
>
> 1) creating an S4 class union of an S4 class (e.g. RangedData) with vector so the existing S4 dispatch would choose the correct method or
> 2) creating an S4 default method for cbind that has it own dispatch mechanism for choosing a cbind method.
>
> I don't find either of these options appealing and second Michael Lawrence's suggestion of using "$<-" or "[[<-" to bind new columns to a RangedData object.
>
> > a.value <- rnorm(4)
> > rd1 <- RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), width=runif(4, min=1, max=10E5), names=paste("bla",1:4)), space=1:2)
> > obj <- cbind(rd1, a.value)
> > showMethods("cbind")
> Function: cbind (package IRanges)
> ...="ANY"
> ...="DataFrame"
> ...="DataFrameList"
> ...="DataTable"
> ...="numeric#RangedData"
> (inherited from: ...="ANY")
>
> > df1 <- unlist(values(rd1))
> > class(df1)
> [1] "DataFrame"
> attr(,"package")
> [1] "IRanges"
> > cbind(df1, a.value)
> df1 a.value
> [1,] ? -0.6268173
> [2,] ? 2.540871
> [3,] ? 0.4137926
> [4,] ? -0.897856
> > showMethods("cbind")
> Function: cbind (package IRanges)
> ...="ANY"
> ...="DataFrame"
> ...="DataFrame#numeric"
> (inherited from: ...="ANY")
> ...="DataFrameList"
> ...="DataTable"
> ...="numeric#RangedData"
> (inherited from: ...="ANY")
>
> > sessionInfo()
> R version 2.11.0 Under development (unstable) (2010-03-14 r51276)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] IRanges_1.5.64
>
>
> On 3/18/10 10:32 AM, Michael Lawrence wrote:
>> On Thu, Mar 18, 2010 at 7:55 AM, Michael Dondrup<Michael.Dondrup at uni.no>wrote:
>>
>>
>>> Hi,
>>> here is another little possible glitch with RangedData and cbind(),
>>> actually would like to propose to
>>> change or expand the behavior of the cbind function or to add to it's
>>> documentation. The use-case is as
>>> follows:
>>> Assume we have some chromosomal Ranges in a RangedData object. Then we can
>>> iteratively compute statistics on
>>> these ranges and attach them to the DataFrame holding extra data, e.g. some
>>> count data or combine qualitiy scores possibly from multiple conditions.
>>>
>>> So according to the documentation of the RangedData-class,
>>>
>>>> The first mode treats the object as a contiguous "data frame" annotated
>>>>
>>> with range information.
>>>
>>>> The accessors start, end, and width get the corresponding fields in the
>>>>
>>> ranges as atomic integer vectors, undoing
>>>
>>>> the division over the spaces. The [[> and matrix-style [, extraction and
>>>>
>>> subsetting functions unroll the data in the same way. [[<- does the inverse.
>>> I assume I could use cbind(rd, a.value) to attach the statistics to the
>>> internal data representation. So would it be possible to
>>> make cbind return something more useful, or are there better ways to do it?
>>>
>>>
>>>
>>>
>> Right now it's just using the cbind method for "ANY", because one does not
>> exist for RangedData. To be honest, I've always just used the $<- syntax for
>> adding the statistics. This seems like it would work well in your use case,
>> as well.
>>
>> Like:
>>
>> rd$a.value<- a.value
>>
>> Michael
>>
>>
>>
>>
>>> Best
>>> Michael
>>>
>>>
>>> Example:
>>>
>>>
>>>> a.value = rnorm(4)
>>>> rd1 = RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8),
>>>>
>>> width=runif(4, min=1, max=10E5), names=paste("bla",1:4)), space=1:2)
>>>
>>>> rd1
>>>>
>>> RangedData with 4 rows and 0 value columns across 2 spaces
>>> space ranges |
>>> <character> <IRanges> |
>>> bla 1 1 [773679042, 774010137] |
>>> bla 3 1 [194819013, 195136171] |
>>> bla 2 2 [183105318, 183509803] |
>>> bla 4 2 [107730452, 107823748] |
>>>
>>>
>>>> obj = cbind(rd1, a.value)
>>>>
>>> And I would intuitively assume the result to look exactly like this:
>>>
>>>
>>>> RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), width=runif(4,
>>>>
>>> min=1, max=10E5), names=paste("bla",1:4)), space=1:2, a.value)
>>> RangedData with 4 rows and 1 value column across 2 spaces
>>> space ranges | a.value
>>> <character> <IRanges> |<numeric>
>>> bla 1 1 [473042533, 473820859] | -1.7956588
>>> bla 3 1 [ 75991383, 76022516] | 0.3588571
>>> bla 2 2 [475385363, 476224756] | 1.4166218
>>> bla 4 2 [532603052, 532902678] | 0.2324424
>>>
>>> But what I get is much different:
>>>
>>>
>>>> class(obj)
>>>>
>>> [1] "matrix"
>>>
>>>> typeof(obj)
>>>>
>>> [1] "list"
>>>
>>>
>>>> obj
>>>>
>>> rd1 a.value
>>> [1,] ? 0.3255676
>>> [2,] ? 0.5913471
>>> [3,] ? 0.9317755
>>> [4,] ? -0.8897527
>>>
>>>
>>>> sessionInfo()
>>>>
>>> R version 2.10.1 (2009-12-14)
>>> x86_64-apple-darwin9.8.0
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] IRanges_1.4.9
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.10.1
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
More information about the Bioconductor
mailing list