[R] Unexpected behaviour as.data.frame

Jan van der Laan rhelp at eoos.dds.nl
Mon May 16 12:00:04 CEST 2011


Santosh, Ivan,

This is also what I was looking for. Thanks. Looking at the source of 
dataFrame.default is seems that it uses the same approach as I did: 
first create a list then a data.frame from that list. I think I'll stick 
with the code I already had as I don't want another dependency (multiple 
actually for R.utils). But thanks again for pointing it out.

Jan

On 05/16/2011 10:42 AM, Santosh Srinivas wrote:
> Hi Ivan, Take a look dataFrame in R.utils ... is that what you want?
>
> from the help file:
>
> Examples
>
>    df<- dataFrame(colClasses=c(a="integer", b="double"), nrow=10)
>    df[,1]<- sample(1:nrow(df))
>    df[,2]<- rnorm(nrow(df))
>    print(df)
>
> Thanks,
> Santosh
>
> On Mon, May 16, 2011 at 1:42 PM, Ivan Calandra
> <ivan.calandra at uni-hamburg.de>  wrote:
>> I feel like I'm always asking this type of questions, but is it possible to
>> add a base function that allows creating an empty data.frame, as matrix()
>> does?
>>
>> What I mean would be something like:
>> create.data.frame(number_of_columns, mode_of_columns).
>> I think it would make things easier than creating one or several matrices
>> and then combining them
>>
>> Is it possible; does it make sense?
>>
>> Ivan
>>
>> Le 5/15/2011 22:17, Bert Gunter a écrit :
>>> Inline below.
>>>
>>> On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan<rhelp at eoos.dds.nl>
>>>   wrote:
>>>> Thanks. I also noticed myself minutes after sending my message to the
>>>> list.
>>>> My 'please ignore my question it was just a stupid typo' message was sent
>>>> with the wrong account and is now awaiting moderation.
>>>>
>>>> However, my other question still stands: what is the
>>>> preferred/fastest/simplest way to create a data.fame with given column
>>>> types
>>>> and dimensions?
>>> I do not know, but  why is simply
>>>
>>> data.frame(numeric(10), character(10), integer(10),
>>> stringsAsFactors=FALSE)
>>>
>>> not acceptable? Note that if you had, say, 500, numeric (= double) and
>>> 100 character columns to add, you might do something like:
>>>
>>>> z<- matrix(numeric(5000),nr=10)
>>>> u<- matrix(character(1000),nr=10)
>>>> frm<- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns
>>> While this might save some typing, it may not be much more efficient
>>> than typing it all out -- maybe just some parsing time is saved. You
>>> can experiment and see.
>>>
>>> However, since a data.frame **is** a list with added attributes and a
>>> great deal of the work of the constructor is in constructing and
>>> checking these attributes (e.g. row and column names), I see nothing
>>> terribly inefficient with what you did. It's just a bit obscure.  But
>>> maybe someone with greater expertise will set us both straight.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>>> Regards,
>>>> Jan
>>>>
>>>>
>>>> On 05/15/2011 04:43 PM, Bert Gunter wrote:
>>>>> In your post, you're missing the final "s" on the stringsAsFactors
>>>>> argument in the d1 assignment. When I typed it correctly, it works as
>>>>> expected.
>>>>>
>>>>> -- Bert
>>>>>
>>>>> On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan<rhelp at eoos.dds.nl>
>>>>>   wrote:
>>>>>> I use the following code to create two data.frames d1 and d2 from a
>>>>>> list:
>>>>>> types<- c("integer", "character", "double")
>>>>>> nlines<- 10
>>>>>> d1<- as.data.frame(lapply(types, do.call, list(nlines)),
>>>>>> stringsAsFactor=FALSE)
>>>>>> l2<- lapply(types, do.call, list(nlines))
>>>>>> d2<- as.data.frame(l2, stringsAsFactors=FALSE)
>>>>>>
>>>>>> I would expect d1 and d2 to be the same, however, in d1 the second
>>>>>> column
>>>>>> is
>>>>>> a factor while in d2 it is a character (which I would expect):
>>>>>>
>>>>>>> str(d1)
>>>>>> 'data.frame':   10 obs. of  3 variables:
>>>>>>   $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
>>>>>>   $ c........................................: Factor w/ 1 level "": 1 1
>>>>>> 1
>>>>>> 1
>>>>>> 1 1 1 1 1 1
>>>>>>   $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0
>>>>>>> str(d2)
>>>>>> 'data.frame':   10 obs. of  3 variables:
>>>>>>   $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
>>>>>>   $ c........................................: chr  "" "" "" "" ...
>>>>>>   $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0
>>>>>>
>>>>>>
>>>>>> As different but related question: I use the commands above to create
>>>>>> an
>>>>>> 'empty' data.frame with specified column types and dimensions. I need
>>>>>> this
>>>>>> data.frame to pass on to my c++ routines. Is there a more
>>>>>> simple/elegant
>>>>>> way
>>>>>> of creating this data.frame?
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Jan
>>>>>>
>>>>>>
>>>>>> PS:
>>>>>> I am running R on 64 bit Ubuntu 11.04:
>>>>>>
>>>>>>> sessionInfo()
>>>>>> R version 2.12.1 (2010-12-16)
>>>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>>
>>>>>> locale:
>>>>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>
>> --
>> Ivan CALANDRA
>> PhD Student
>> University of Hamburg
>> Biozentrum Grindel und Zoologisches Museum
>> Abt. Säugetiere
>> Martin-Luther-King-Platz 3
>> D-20146 Hamburg, GERMANY
>> +49(0)40 42838 6231
>> ivan.calandra at uni-hamburg.de
>>
>> **********
>> http://www.for771.uni-bonn.de
>> http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list