[R] Unexpected behaviour as.data.frame

Ivan Calandra ivan.calandra at uni-hamburg.de
Mon May 16 11:35:44 CEST 2011


Actually, what would be even better would be an extra argument to 
specify the column names.
I don't think it's very difficult to implement and it would make things 
even easier.
Ivan

Le 5/16/2011 11:25, Ivan Calandra a écrit :
> Thanks Santosh!
> The more I learn about R.utils, the more I think that many of its 
> functions should be included in the base distribution.
> Ivan
>
> Le 5/16/2011 10:42, Santosh Srinivas a écrit :
>> Hi Ivan, Take a look dataFrame in R.utils ... is that what you want?
>>
>> from the help file:
>>
>> Examples
>>
>>    df<- dataFrame(colClasses=c(a="integer", b="double"), nrow=10)
>>    df[,1]<- sample(1:nrow(df))
>>    df[,2]<- rnorm(nrow(df))
>>    print(df)
>>
>> Thanks,
>> Santosh
>>
>> On Mon, May 16, 2011 at 1:42 PM, Ivan Calandra
>> <ivan.calandra at uni-hamburg.de>  wrote:
>>> I feel like I'm always asking this type of questions, but is it 
>>> possible to
>>> add a base function that allows creating an empty data.frame, as 
>>> matrix()
>>> does?
>>>
>>> What I mean would be something like:
>>> create.data.frame(number_of_columns, mode_of_columns).
>>> I think it would make things easier than creating one or several 
>>> matrices
>>> and then combining them
>>>
>>> Is it possible; does it make sense?
>>>
>>> Ivan
>>>
>>> Le 5/15/2011 22:17, Bert Gunter a écrit :
>>>> Inline below.
>>>>
>>>> On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan<rhelp at eoos.dds.nl>
>>>>   wrote:
>>>>> Thanks. I also noticed myself minutes after sending my message to the
>>>>> list.
>>>>> My 'please ignore my question it was just a stupid typo' message 
>>>>> was sent
>>>>> with the wrong account and is now awaiting moderation.
>>>>>
>>>>> However, my other question still stands: what is the
>>>>> preferred/fastest/simplest way to create a data.fame with given 
>>>>> column
>>>>> types
>>>>> and dimensions?
>>>> I do not know, but  why is simply
>>>>
>>>> data.frame(numeric(10), character(10), integer(10),
>>>> stringsAsFactors=FALSE)
>>>>
>>>> not acceptable? Note that if you had, say, 500, numeric (= double) and
>>>> 100 character columns to add, you might do something like:
>>>>
>>>>> z<- matrix(numeric(5000),nr=10)
>>>>> u<- matrix(character(1000),nr=10)
>>>>> frm<- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns
>>>> While this might save some typing, it may not be much more efficient
>>>> than typing it all out -- maybe just some parsing time is saved. You
>>>> can experiment and see.
>>>>
>>>> However, since a data.frame **is** a list with added attributes and a
>>>> great deal of the work of the constructor is in constructing and
>>>> checking these attributes (e.g. row and column names), I see nothing
>>>> terribly inefficient with what you did. It's just a bit obscure.  But
>>>> maybe someone with greater expertise will set us both straight.
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>>
>>>>> Regards,
>>>>> Jan
>>>>>
>>>>>
>>>>> On 05/15/2011 04:43 PM, Bert Gunter wrote:
>>>>>> In your post, you're missing the final "s" on the stringsAsFactors
>>>>>> argument in the d1 assignment. When I typed it correctly, it 
>>>>>> works as
>>>>>> expected.
>>>>>>
>>>>>> -- Bert
>>>>>>
>>>>>> On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan<rhelp at eoos.dds.nl>
>>>>>>   wrote:
>>>>>>> I use the following code to create two data.frames d1 and d2 from a
>>>>>>> list:
>>>>>>> types<- c("integer", "character", "double")
>>>>>>> nlines<- 10
>>>>>>> d1<- as.data.frame(lapply(types, do.call, list(nlines)),
>>>>>>> stringsAsFactor=FALSE)
>>>>>>> l2<- lapply(types, do.call, list(nlines))
>>>>>>> d2<- as.data.frame(l2, stringsAsFactors=FALSE)
>>>>>>>
>>>>>>> I would expect d1 and d2 to be the same, however, in d1 the second
>>>>>>> column
>>>>>>> is
>>>>>>> a factor while in d2 it is a character (which I would expect):
>>>>>>>
>>>>>>>> str(d1)
>>>>>>> 'data.frame':   10 obs. of  3 variables:
>>>>>>>   $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 
>>>>>>> 0 0 0 0
>>>>>>>   $ c........................................: Factor w/ 1 level 
>>>>>>> "": 1 1
>>>>>>> 1
>>>>>>> 1
>>>>>>> 1 1 1 1 1 1
>>>>>>>   $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 
>>>>>>> 0 0 0 0
>>>>>>>> str(d2)
>>>>>>> 'data.frame':   10 obs. of  3 variables:
>>>>>>>   $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 
>>>>>>> 0 0 0 0
>>>>>>>   $ c........................................: chr  "" "" "" "" ...
>>>>>>>   $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>>
>>>>>>> As different but related question: I use the commands above to 
>>>>>>> create
>>>>>>> an
>>>>>>> 'empty' data.frame with specified column types and dimensions. I 
>>>>>>> need
>>>>>>> this
>>>>>>> data.frame to pass on to my c++ routines. Is there a more
>>>>>>> simple/elegant
>>>>>>> way
>>>>>>> of creating this data.frame?
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Jan
>>>>>>>
>>>>>>>
>>>>>>> PS:
>>>>>>> I am running R on 64 bit Ubuntu 11.04:
>>>>>>>
>>>>>>>> sessionInfo()
>>>>>>> R version 2.12.1 (2010-12-16)
>>>>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>>>
>>>>>>> locale:
>>>>>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>>>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>
>>>>>>> attached base packages:
>>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   
>>>>>>> base
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>
>>> -- 
>>> Ivan CALANDRA
>>> PhD Student
>>> University of Hamburg
>>> Biozentrum Grindel und Zoologisches Museum
>>> Abt. Säugetiere
>>> Martin-Luther-King-Platz 3
>>> D-20146 Hamburg, GERMANY
>>> +49(0)40 42838 6231
>>> ivan.calandra at uni-hamburg.de
>>>
>>> **********
>>> http://www.for771.uni-bonn.de
>>> http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>

-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php



More information about the R-help mailing list