[R] issue building dataframes with matrices.
Daryl Morris
darylm at u.washington.edu
Wed Aug 13 03:30:40 CEST 2008
Hello,
Is this a bug or a feature? I am using R 2.7.1 on Apple OS X.
> y <- matrix(1:3,nrow=3) # y is a single-column matrix
> df <-data.frame(x=1:3,y=y)
> sapply(df,data.class)
x y
"numeric" "numeric"
> df$yy <- y
> sapply(df,data.class)
x y yy
"numeric" "numeric" "matrix"
I'm not sure why dataframes are allowed to have matrices as members.
It's also weird to me that y & yy have different classes. It seems like
there has been a blurring of the line between lists and dataframes.
When did dataframes start taking members other than vectors?
This is an issue if one for example builds a dataframe to fit a model,
and then subsequently wants to use predict. You have to work a bit to
avoid a type mismatch error.
> df$out = df$x+df$y+df$yy + rnorm(3)
> df
x y yy out
1 1 1 1 3.066348
2 2 2 2 5.516017
3 3 3 3 11.073452
> glmout = glm(out~x+y+yy,data=df)
> predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=1:3))
Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric"
was supplied
>
> predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=matrix(1:3)))
Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric"
was supplied
> predict(glmout,newdata=df[,-4])
1 2 3
2.548387 6.551939 10.555491
Warning message:
In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == :
prediction from a rank-deficient fit may be misleading
I'm not really looking for a "solution", as I can already identify
several workarounds. I guess I'm mainly trying to figure out what the
philosophy is here.
This is also weird to me:
> df$yy <- as.data.frame(y)
> df
x y V1 out
1 1 1 1 3.066348
2 2 2 2 5.516017
3 3 3 3 11.073452
> glmout = glm(out~x+y+V1,data=df)
Error in eval(expr, envir, enclos) : object "V1" not found
> glmout = glm(out~x+y+yy,data=df)
Error in model.frame.default(formula = out ~ x + y + yy, data = df,
drop.unused.levels = TRUE) :
invalid type (list) for variable 'yy'
> glmout = glm(out~x+y+yy$VI,data=df)
Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df, :
invalid type (NULL) for variable 'yy$VI'
Is it impossible to build a model from a dataframe built this way?
thanks, Daryl Morris
(Biostatistics, Univ. of Washington)
More information about the R-help
mailing list