[R] Different results when converting a matrix to a data.frame

Wed Nov 16 18:20:34 CET 2016

> On Nov 16, 2016, at 8:43 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
> 
> I will start by admitting I don't know the answer to your question.
> 
> However, I am responding because I think this should not be an issue in real life use of R. Data frames are lists of distinct vectors, each of which has its own reason for being present in the data, and normally each has its own storage mode. Your use of a matrix as a short cut way to create many columns at once does not change this fundamental difference between data frames and matrices. You should not be surprised that putting the finishing touches on this transformation takes some personal attention. 
> 
> Normally you should give explicit names to each column using the argument names in the data.frame function. When using a matrix as a shortcut, you should either immediately follow the creation of the data frame with a names(DF)<- assignment, or wrap it in a setNames function call. 
> 
> setNames( data.frame(matrix(NA, 2, 2)), c( "ColA", "ColB" ) )
> 
> Note that using a matrix to create many columns is memory inefficient, because you start by setting aside a single block of memory (the matrix) and then you move that data column at a time to separate vectors for use in the data frame. If working with large data you might want to consider allocating each column separately from the beginning. 
> 
> N <- 2
> nms <- c( "A", "B" )
> as.data.frame( setNames( lapply( nms, function(n){ rep( NA, 2 ) } ), nms ) )
> 
> which is not as convenient, but illustrates that data frames are truly different than matrices.
> -- 
> Sent from my phone. Please excuse my brevity.
> 
> On November 16, 2016 7:20:38 AM PST, G.Maubach at weinwolf.de wrote:
>> Hi All,
>> 
>> I build an empty dataframe to fill it will values later. I did the 
>> following:
>> 
>> -- cut --
>> matrix(NA, 2, 2)
>>    [,1] [,2]
>> [1,]   NA   NA
>> [2,]   NA   NA
>>> data.frame(matrix(NA, 2, 2))
>> X1 X2
>> 1 NA NA
>> 2 NA NA
>>> as.data.frame(matrix(NA, 2, 2))
>> V1 V2
>> 1 NA NA
>> 2 NA NA
>> -- cut --
>> 
>> Why does data.frame deliver different results than as.data.frame with 
>> regard to the variable names (V instead of X)?

They are two different functions:

It's fairly easy to see by looking at the code:

as.data.frame.matrix uses: names(value) <- paste0("V", ic)  when there are no column names and data.frame calls make.names which prepends an "X" as the first letter of invalid or missing names.

As to why the authors did it this way, I'm unable to comment.

>> 
>> Kind regards
>> 
>> Georg
>> 
>> 	[[alternative HTML version deleted]]

David Winsemius
Alameda, CA, USA