[R] Using predict.lm()
Steven White
swhite at aegis-semi.com
Thu Jun 17 16:04:19 CEST 2004
Greetings,
Following the example in help(predict.lm):
x <- rnorm(15)
y <- x + rnorm(15)
new <- data.frame(x = seq(-3, 3, 0.5))
predict(lm(y ~ x), new)
predicts the response elements corresponding to new$x as can be viewed by:
plot(x,y)
lines(new$x,predict(lm(y ~ x), new))
I am trying to extend this fitting and prediction over a variety of factors as
follows:
f<-rep(c("FIRST","SECOND"),each=15)
f<-as.factor(f)
x<-rep(rnorm(15),2)
y<-x+rnorm(length(x))
old<-data.frame(f=f,x=x,y=y)
new<-data.frame(f=rep(levels(f),each=length(seq(-4,4,0.2))),x=seq(-4,4,0.2))
...where variable new simply substitutes a differing domain than old. When I
try to predict on the frame new using x & y, I get a response that
corresponds to the length of new:
predict(lm(y~x),new)
but when I use the same variables from within the frame old, the frame new is
ignored:
predict(lm(old$y~old$x),new)
...results in a response the length of old$x (presumably predicting over the
values of old$x). Furthermore, this behavior also precludes using something
more useful, i.e.:
predict(lm(old$y~old$f/(1+old$x)-1),new)
to return predictions over a number of factors over redefined domains. In my
case, I am attempting to do 2nd order polynomial fitting over noisy data
collected for a large number of factors (~85). The data were collected for
each factor at convenient (and therefore dissimilar) points within a common
domain, but I need to compare the responses of each factor at similar points
within the common domain.
I am obviously missing something here because I continue to be puzzled by the
result. I had thought (perhaps erroneously) that lm() would return a model
object that would permit prediction. Indeed:
lm(old$y~old$f/(1+old$x)-1)
...results in:
Call:
lm(formula = old$y ~ old$f/(1 + old$x) - 1)
Coefficients:
old$fFIRST old$fSECOND old$fFIRST:old$x old$fSECOND:old$x
-0.08489 -0.05839 1.15351 0.72981
which clearly provides a model fit for each factor, and identifies the factor
from which each model coefficient was extracted, so lm() does provide the
capability to predict over the factors. It seems however (as nearly as I can
tell), that predict simply ignores the frame new altogether, failing even to
provide a warning.
Is this the intended behavior? Have I missed something very simple or have a
fundamental misunderstanding of how this should work? Lastly, I'd appreciate
any suggestions that avoid the lengthy and wholly undesirable "brute force"
approach I an now considering.
Thanks & Best Regards,
Steve
More information about the R-help
mailing list