[R] Unexpected behaviour as.data.frame

Ivan Calandra ivan.calandra at uni-hamburg.de
Mon May 16 11:43:01 CEST 2011


Forget this last email, I oversaw the implementation in the examples...
Ivan


Le 5/16/2011 11:35, Ivan Calandra a écrit :
> Actually, what would be even better would be an extra argument to 
> specify the column names.
> I don't think it's very difficult to implement and it would make 
> things even easier.
> Ivan
>
> Le 5/16/2011 11:25, Ivan Calandra a écrit :
>> Thanks Santosh!
>> The more I learn about R.utils, the more I think that many of its 
>> functions should be included in the base distribution.
>> Ivan
>>
>> Le 5/16/2011 10:42, Santosh Srinivas a écrit :
>>> Hi Ivan, Take a look dataFrame in R.utils ... is that what you want?
>>>
>>> from the help file:
>>>
>>> Examples
>>>
>>>    df<- dataFrame(colClasses=c(a="integer", b="double"), nrow=10)
>>>    df[,1]<- sample(1:nrow(df))
>>>    df[,2]<- rnorm(nrow(df))
>>>    print(df)
>>>
>>> Thanks,
>>> Santosh
>>>
>>> On Mon, May 16, 2011 at 1:42 PM, Ivan Calandra
>>> <ivan.calandra at uni-hamburg.de>  wrote:
>>>> I feel like I'm always asking this type of questions, but is it 
>>>> possible to
>>>> add a base function that allows creating an empty data.frame, as 
>>>> matrix()
>>>> does?
>>>>
>>>> What I mean would be something like:
>>>> create.data.frame(number_of_columns, mode_of_columns).
>>>> I think it would make things easier than creating one or several 
>>>> matrices
>>>> and then combining them
>>>>
>>>> Is it possible; does it make sense?
>>>>
>>>> Ivan
>>>>
>>>> Le 5/15/2011 22:17, Bert Gunter a écrit :
>>>>> Inline below.
>>>>>
>>>>> On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan<rhelp at eoos.dds.nl>
>>>>>   wrote:
>>>>>> Thanks. I also noticed myself minutes after sending my message to 
>>>>>> the
>>>>>> list.
>>>>>> My 'please ignore my question it was just a stupid typo' message 
>>>>>> was sent
>>>>>> with the wrong account and is now awaiting moderation.
>>>>>>
>>>>>> However, my other question still stands: what is the
>>>>>> preferred/fastest/simplest way to create a data.fame with given 
>>>>>> column
>>>>>> types
>>>>>> and dimensions?
>>>>> I do not know, but  why is simply
>>>>>
>>>>> data.frame(numeric(10), character(10), integer(10),
>>>>> stringsAsFactors=FALSE)
>>>>>
>>>>> not acceptable? Note that if you had, say, 500, numeric (= double) 
>>>>> and
>>>>> 100 character columns to add, you might do something like:
>>>>>
>>>>>> z<- matrix(numeric(5000),nr=10)
>>>>>> u<- matrix(character(1000),nr=10)
>>>>>> frm<- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns
>>>>> While this might save some typing, it may not be much more efficient
>>>>> than typing it all out -- maybe just some parsing time is saved. You
>>>>> can experiment and see.
>>>>>
>>>>> However, since a data.frame **is** a list with added attributes and a
>>>>> great deal of the work of the constructor is in constructing and
>>>>> checking these attributes (e.g. row and column names), I see nothing
>>>>> terribly inefficient with what you did. It's just a bit obscure.  But
>>>>> maybe someone with greater expertise will set us both straight.
>>>>>
>>>>> Cheers,
>>>>> Bert
>>>>>
>>>>>
>>>>>> Regards,
>>>>>> Jan
>>>>>>
>>>>>>
>>>>>> On 05/15/2011 04:43 PM, Bert Gunter wrote:
>>>>>>> In your post, you're missing the final "s" on the stringsAsFactors
>>>>>>> argument in the d1 assignment. When I typed it correctly, it 
>>>>>>> works as
>>>>>>> expected.
>>>>>>>
>>>>>>> -- Bert
>>>>>>>
>>>>>>> On Sun, May 15, 2011 at 4:25 AM, Jan van der 
>>>>>>> Laan<rhelp at eoos.dds.nl>
>>>>>>>   wrote:
>>>>>>>> I use the following code to create two data.frames d1 and d2 
>>>>>>>> from a
>>>>>>>> list:
>>>>>>>> types<- c("integer", "character", "double")
>>>>>>>> nlines<- 10
>>>>>>>> d1<- as.data.frame(lapply(types, do.call, list(nlines)),
>>>>>>>> stringsAsFactor=FALSE)
>>>>>>>> l2<- lapply(types, do.call, list(nlines))
>>>>>>>> d2<- as.data.frame(l2, stringsAsFactors=FALSE)
>>>>>>>>
>>>>>>>> I would expect d1 and d2 to be the same, however, in d1 the second
>>>>>>>> column
>>>>>>>> is
>>>>>>>> a factor while in d2 it is a character (which I would expect):
>>>>>>>>
>>>>>>>>> str(d1)
>>>>>>>> 'data.frame':   10 obs. of  3 variables:
>>>>>>>>   $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 
>>>>>>>> 0 0 0 0
>>>>>>>>   $ c........................................: Factor w/ 1 
>>>>>>>> level "": 1 1
>>>>>>>> 1
>>>>>>>> 1
>>>>>>>> 1 1 1 1 1 1
>>>>>>>>   $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 
>>>>>>>> 0 0 0 0
>>>>>>>>> str(d2)
>>>>>>>> 'data.frame':   10 obs. of  3 variables:
>>>>>>>>   $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 
>>>>>>>> 0 0 0 0
>>>>>>>>   $ c........................................: chr  "" "" "" "" 
>>>>>>>> ...
>>>>>>>>   $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 
>>>>>>>> 0 0 0 0
>>>>>>>>
>>>>>>>>
>>>>>>>> As different but related question: I use the commands above to 
>>>>>>>> create
>>>>>>>> an
>>>>>>>> 'empty' data.frame with specified column types and dimensions. 
>>>>>>>> I need
>>>>>>>> this
>>>>>>>> data.frame to pass on to my c++ routines. Is there a more
>>>>>>>> simple/elegant
>>>>>>>> way
>>>>>>>> of creating this data.frame?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Jan
>>>>>>>>
>>>>>>>>
>>>>>>>> PS:
>>>>>>>> I am running R on 64 bit Ubuntu 11.04:
>>>>>>>>
>>>>>>>>> sessionInfo()
>>>>>>>> R version 2.12.1 (2010-12-16)
>>>>>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>>>>
>>>>>>>> locale:
>>>>>>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>>>>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>>
>>>>>>>> attached base packages:
>>>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   
>>>>>>>> base
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-help at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>>
>>>>>
>>>> -- 
>>>> Ivan CALANDRA
>>>> PhD Student
>>>> University of Hamburg
>>>> Biozentrum Grindel und Zoologisches Museum
>>>> Abt. Säugetiere
>>>> Martin-Luther-King-Platz 3
>>>> D-20146 Hamburg, GERMANY
>>>> +49(0)40 42838 6231
>>>> ivan.calandra at uni-hamburg.de
>>>>
>>>> **********
>>>> http://www.for771.uni-bonn.de
>>>> http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
>

-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php



More information about the R-help mailing list