[R] Problem with as.data.frame when an extra attribute is present

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Nov 11 06:34:38 CET 2006


On Fri, 10 Nov 2006, Frank E Harrell Jr wrote:

> Prof Brian Ripley wrote:
>> It's quite intentional, as it is the documented behaviour of data.frame:
>>
>>      Objects passed to 'data.frame' should have the same number of
>>      rows, but atomic vectors, factors and character vectors protected
>>      by 'I' will be recycled a whole number of times if necessary.
>>
>>> data.frame(a = structure(1, label="foo"), b = c(2, 3))
>> Error in data.frame(a = structure(1, label = "foo"), b = c(2, 3)) :
>>         arguments imply differing number of rows: 1, 2
>>
>> It is safe to replicate a vector without any attributes, but not safe to
>> replicate this 'a': you will have to do it yourself if you know it is
>> safe.  How is anyone to know you meant 'label' to apply to the whole
>> vector and not the single element of the vector (if you did)?
>
> Thanks Brian for clarifying that.  Is there a way to use a specially
> written as.data.frame.labelled function to do this?  I assume there is
> no way to use I() here.

No, as the replication is done in data.frame after calling as.data.frame 
on each list component (and it has to be that way as there is no way to 
predict how many rows as.data.frame will return).  I() is only relevant 
for character vectors of class "AsIs" (and not even for classes inheriting 
from AsIs).

I toyed with the idea that we could try to make use of a rep() method 
here, but that could fall back to the default method and I don't see how 
to avoid letting unsafe cases through.

>
> Frank
>
>>
>>
>> On Thu, 9 Nov 2006, Frank E Harrell Jr wrote:
>>
>>> I have a problem when one of the vectors in a list needs to be
>>> replicated to have the appropriate length, and an attribute is present.
>>>
>>>> w <- list(a=1, b=2:3)
>>>> as.data.frame(w)
>>>   a b
>>> 1 1 2
>>> 2 1 3
>>>
>>>> attr(w$a,'label') <- 'foo'
>>>> as.data.frame(w)
>>> Error in data.frame(a = 1, b = c(2, 3), check.names = TRUE) :
>>>         arguments imply differing number of rows: 1, 2
>>>
>>> I usually use the Hmisc label function to make a variable of class
>>> 'labelled' and define as.data.frame.labelled as as.data.frame.vector,
>>> but that also fails here.  Any help appreciated.  -Frank
>>>
>>>> sessionInfo()
>>> R version 2.2.1, 2005-12-20, i486-pc-linux-gnu [also fails in 2.4.0]
>>>
>>> attached base packages:
>>> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"
>>> "datasets"
>>> [7] "base"
>>>
>>>
>>
>
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list