[R] as.data.frame(cbind()) transforming numeric to factor?

Marc Schwartz (via MN) mschwartz at mn.rr.com
Fri Aug 18 16:55:44 CEST 2006


On Fri, 2006-08-18 at 10:41 -0400, Tom Boonen wrote:
> Dear List,
> 
> why does as.data.frame(cbind()) transform numeric variables to
> factors, once one of the other variablesused is a character vector?
> 
> #
> x.1 <- rnorm(10)
> x.2 <- c(rep("Test",10))
> Foo <- as.data.frame(cbind(x.1))
> is.factor(Foo$x.1)
> 
> Foo <- as.data.frame(cbind(x.1,x.2))
> is.factor(Foo$x.1)
> #
> 
> I assume there is a good reason for this, can somebody explain? Thanks.
> 
> Best,
> Tom

See the Note section of ?cbind, which states:

The method dispatching is not done via UseMethod(), but by C-internal
dispatching. Therefore, there is no need for, e.g., rbind.default.

The dispatch algorithm is described in the source file
(‘.../src/main/bind.c’) as

     1. For each argument we get the list of possible class memberships
        from the class attribute.
     2. We inspect each class in turn to see if there is an an
        applicable method.
     3. If we find an applicable method we make sure that it is
        identical to any method determined for prior arguments. If it is
        identical, we proceed, otherwise we immediately drop through to
        the default code.

If you want to combine other objects with data frames, it may be
necessary to coerce them to data frames first. (Note that this algorithm
can result in calling the data frame method if the arguments are all
either data frames or vectors, and this will result in the coercion of
character vectors to factors.)


Thus, note the result of:

> str(cbind(x.1, x.2))
 chr [1:10, 1:2] "-0.265756038510064" "2.13220714034528" ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "x.1" "x.2"

Since a matrix can only contain a single data type, the numeric vector
is coerced to character.

Then using as.data.frame() coerces the character matrix to factors,
which is the default behavior.

If you want to create a data frame, do it this way:

> str(data.frame(x.1, x.2))
`data.frame':   10 obs. of  2 variables:
 $ x.1: num  -0.266  2.132  2.096 -0.128 -0.466 ...
 $ x.2: Factor w/ 1 level "Test": 1 1 1 1 1 1 1 1 1 1

or if you want to retain the character vector, use I():

> str(data.frame(x.1, I(x.2)))
`data.frame':   10 obs. of  2 variables:
 $ x.1: num  -0.266  2.132  2.096 -0.128 -0.466 ...
 $ x.2:Class 'AsIs'  chr [1:10] "Test" "Test" "Test" "Test" ...


See ?data.frame for more information.

HTH,

Marc Schwartz



More information about the R-help mailing list