[R] issue building dataframes with matrices.

Daryl Morris darylm at u.washington.edu
Wed Aug 13 03:30:40 CEST 2008


Hello,
Is this a bug or a feature?  I am using R 2.7.1 on Apple OS X.


 > y <- matrix(1:3,nrow=3)     # y is a single-column matrix
 > df <-data.frame(x=1:3,y=y)
 > sapply(df,data.class)
        x         y
"numeric" "numeric"
 > df$yy <- y
 > sapply(df,data.class)
        x         y        yy
"numeric" "numeric"  "matrix"


I'm not sure why dataframes are allowed to have matrices as members.    
It's also weird to me that y & yy have different classes.  It seems like 
there has been a blurring of the line between lists and dataframes.   
When did dataframes start taking members other than vectors?

This is an issue if one for example builds a dataframe to fit a model, 
and then subsequently wants to use predict.  You have to work a bit to 
avoid a type mismatch error.

 > df$out = df$x+df$y+df$yy + rnorm(3)
 > df
  x y yy       out
1 1 1  1  3.066348
2 2 2  2  5.516017
3 3 3  3 11.073452

 
 > glmout = glm(out~x+y+yy,data=df)
 > predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=1:3))
Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric" 
was supplied
 >
 > predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=matrix(1:3)))
Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric" 
was supplied
 > predict(glmout,newdata=df[,-4])
        1         2         3
 2.548387  6.551939 10.555491
Warning message:
In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type ==  :
  prediction from a rank-deficient fit may be misleading

I'm not really looking for a "solution", as I can already identify 
several workarounds.  I guess I'm mainly trying to figure out what the 
philosophy is here.

This is also weird to me:

 > df$yy <- as.data.frame(y)
 > df
  x y V1       out
1 1 1  1  3.066348
2 2 2  2  5.516017
3 3 3  3 11.073452
 > glmout = glm(out~x+y+V1,data=df)
Error in eval(expr, envir, enclos) : object "V1" not found
 > glmout = glm(out~x+y+yy,data=df)
Error in model.frame.default(formula = out ~ x + y + yy, data = df, 
drop.unused.levels = TRUE) :
  invalid type (list) for variable 'yy'
 > glmout = glm(out~x+y+yy$VI,data=df)
Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df,  :
  invalid type (NULL) for variable 'yy$VI'

Is it impossible to build a model from a dataframe built this way?


thanks, Daryl Morris
(Biostatistics, Univ. of Washington)



More information about the R-help mailing list