[BioC] IRanges: list columns in RangedData objects (was Re: IRanges: cbind not well defined for RangedData?)
Patrick Aboyoun
paboyoun at fhcrc.org
Sat Mar 20 03:28:41 CET 2010
I've done some testing for as.data.frame on a RangedData object and
found that the existing coercion methodology was producing incorrect
results in certain circumstances when there was a list, SimpleList or
CompressList data column due to vector recycling. For now, as.data.frame
for a RangedData object will throw an error if it contains a list,
SimpleList, or CompressedList data column. If there is demand for
as.data.frame supporting list columns, we can take another look at this
issue.
Thanks,
Patrick
On 3/19/10 5:14 PM, Patrick Aboyoun wrote:
> Michael L.,
> Given that we have IntegerList objects to store lists of integers, I am
> not inclined to build logic for printing a list column in a DataTable.
> To change the current behavior, the relevant method to work on is
> showAsCell,list-method.
>
> The conversion of a DataTable to a data.frame when the DataTable
> contains some non atomic columns is a bit dicey. I'm not sure that a
> data.frame truly supports list columns or it was something grandfathered
> since data.frame inherits from list. For example the data.frame
> constructor converts list inputs to multiple columns:
>
> > data.frame(x = 1:4, y = as.list(2:5))
> x y.2L y.3L y.4L y.5L
> 1 1 2 3 4 5
> 2 2 2 3 4 5
> 3 3 2 3 4 5
> 4 4 2 3 4 5
>
> We can circumvent this behavior by decorating a list object with the
> necessary data.frame attributes, but I'm not sure how many methods will
> be able to handle a data.frame with a list column properly.
>
>
> Patrick
>
>
> On 3/19/10 3:46 PM, Michael Lawrence wrote:
>
>>
>> On Fri, Mar 19, 2010 at 12:59 PM, Patrick Aboyoun<paboyoun at fhcrc.org
>> <mailto:paboyoun at fhcrc.org>> wrote:
>>
>> Michael,
>> Thanks for the report. RangedData objects have been designed to
>> hold list objects in the values columns. You did, however, find a
>> bug the printing of a RangedData object when it contains a list
>> column. I fixed the show method in both BioC 2.5 IRanges (>=
>> 1.4.16) and BioC 2.6 IRanges (>= 1.5.66) to handle this case.
>>
>> > rd<- RangedData(IRanges(start=1:4, width=10,
>> names=paste("a",1:4)), space=1:2 )
>> > rd$a.value<- rnorm(4)
>> > rd$a.list<- as.list(1:4)
>> > rd
>> RangedData with 4 rows and 2 value columns across 2 spaces
>> space ranges | a.value a.list
>> <character> <IRanges> |<numeric> <list>
>> a 1 1 [1, 10] | 0.5362468 ########
>> a 3 1 [3, 12] | 0.5459593 ########
>> a 2 2 [2, 11] | 0.4705777 ########
>> a 4 2 [4, 13] | 0.4160833 ########
>>
>>
>> Thanks for doing this Patrick, but what's the deal with the #'s? I
>> mean, how about "1, 2, 3, 4" instead? That's how data.frame prints it.
>>
>> As you noticed, a list column in a RangedData object will result
>> in column expansion if you convert it to a data.frame, which can
>> lead to large data object is the number of rows in a RangedData
>> object is large.
>>
>>
>> Does this make sense? data.frame can handle list columns.
>>
>> data(mtcars)
>> mtcars$a.list<- list(1:4)
>>
>> Since the show method prints out the classes of each of the
>> columns, the user will be able to check to ensure their data
>> columns are stored correctly prior to any conversion to a data.frame.
>>
>> > as.data.frame(rd)
>> space start end width names a.value a.list.1L a.list.2L
>> a.list.3L a.list.4L
>> 1 1 1 10 10 a 1 0.5362468 1 2
>> 3 4
>> 2 1 3 12 10 a 3 0.5459593 1 2
>> 3 4
>> 3 2 2 11 10 a 2 0.4705777 1 2
>> 3 4
>> 4 2 4 13 10 a 4 0.4160833 1 2
>> 3 4
>>
>>
>>
>> Patrick
>>
>>
>> On 3/19/10 7:23 AM, Michael Dondrup wrote:
>>
>> Dear Patrick and Michael,
>>
>> thank you very much for your helpful support on my last two
>> connected issued! It is somehow in
>> the documentation in the examples but I must have overlooked it.
>>
>> I tried it out immediately, and it works fine:
>>
>>
>> rd = RangedData(IRanges(start=1:4, width=10,
>> names=paste("a",1:4)), space=1:2 )
>> rd
>> rd$a.value = rnorm(4)
>> rd
>>
>> RangedData with 4 rows and 1 value column across 2 spaces
>> space ranges | a.value
>> <character> <IRanges> |<numeric>
>> 1 1 [1, 10] | -0.6765515
>> 2 1 [3, 12] | 1.5406962
>> 3 2 [2, 11] | -1.2599696
>> 4 2 [4, 13] | 0.4971178
>>
>> But then I had to reboot my computer because by accident tried
>> this on a 100,000 ranges
>> and the value was actually a list, not a vector, and then the
>> re-cycling rule struck me:
>>
>>
>> rd$a.list = as.list(1:4)
>>
>> first everything seems fine and normal but if you try to print it:
>>
>> rd
>>
>> RangedData with 4 rows and 1 value column across 2 spaces
>> Error in .Method(..., deparse.level = deparse.level) :
>> number of rows of matrices must match (see arg 2)
>> or try to convert into a data.frame:
>>
>> as.data.frame(rd)
>>
>> space start end width names a.list.1L a.list.2L a.list.3L
>> a.list.4L
>> 1 1 1 10 10 a 1 1 2 3
>> 4
>> 2 1 3 12 10 a 3 1 2 3
>> 4
>> 3 2 2 11 10 a 2 1 2 3
>> 4
>> 4 2 4 13 10 a 4 1 2 3
>> 4
>>
>> as I tried this, I R ran into some memory problems.
>>
>> This just as a warning, to make sure you really use a vector
>> here. Maybe something to put in the
>> type checking, or documentation?
>>
>> Anyway, thanks a lot again
>> Michael
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch<mailto:Bioconductor at stat.math.ethz.ch>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list