[R] Using predict.lm()
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Jun 17 16:25:30 CEST 2004
On Thu, 17 Jun 2004, Steven White wrote:
> Following the example in help(predict.lm):
>
> x <- rnorm(15)
> y <- x + rnorm(15)
> new <- data.frame(x = seq(-3, 3, 0.5))
> predict(lm(y ~ x), new)
>
> predicts the response elements corresponding to new$x as can be viewed by:
>
> plot(x,y)
> lines(new$x,predict(lm(y ~ x), new))
Note that the model is fitted to `x' and new contains `x'. You haven't
copied that.
> I am trying to extend this fitting and prediction over a variety of factors as
> follows:
>
> f<-rep(c("FIRST","SECOND"),each=15)
> f<-as.factor(f)
> x<-rep(rnorm(15),2)
> y<-x+rnorm(length(x))
> old<-data.frame(f=f,x=x,y=y)
> new<-data.frame(f=rep(levels(f),each=length(seq(-4,4,0.2))),x=seq(-4,4,0.2))
>
> ...where variable new simply substitutes a differing domain than old. When I
> try to predict on the frame new using x & y, I get a response that
> corresponds to the length of new:
>
> predict(lm(y~x),new)
>
> but when I use the same variables from within the frame old,
That you have not done correctly: see ?lm.
> the frame new is ignored:
No, it is not ignored but it does not contain a variable named `old$x' and
your workspace does. newdata is the first place to look for variables,
but not the only place.
> predict(lm(old$y~old$x),new)
>
> ...results in a response the length of old$x (presumably predicting over the
> values of old$x). Furthermore, this behavior also precludes using something
> more useful, i.e.:
>
> predict(lm(old$y~old$f/(1+old$x)-1),new)
>
> to return predictions over a number of factors over redefined domains. In my
> case, I am attempting to do 2nd order polynomial fitting over noisy data
> collected for a large number of factors (~85). The data were collected for
> each factor at convenient (and therefore dissimilar) points within a common
> domain, but I need to compare the responses of each factor at similar points
> within the common domain.
>
> I am obviously missing something here because I continue to be puzzled by the
> result. I had thought (perhaps erroneously) that lm() would return a model
> object that would permit prediction.
Indeed it does.
> Indeed:
>
> lm(old$y~old$f/(1+old$x)-1)
>
> ...results in:
>
> Call:
> lm(formula = old$y ~ old$f/(1 + old$x) - 1)
>
> Coefficients:
> old$fFIRST old$fSECOND old$fFIRST:old$x old$fSECOND:old$x
> -0.08489 -0.05839 1.15351 0.72981
>
> which clearly provides a model fit for each factor, and identifies the factor
> from which each model coefficient was extracted, so lm() does provide the
> capability to predict over the factors. It seems however (as nearly as I can
> tell), that predict simply ignores the frame new altogether, failing even to
> provide a warning.
Nope. You just haven't set new to match your fit.
> Is this the intended behavior? Have I missed something very simple or have a
> fundamental misunderstanding of how this should work?
Yes, yes. You should be using
lm(y ~ f/(1+x)-1, data=old)
etc, although in your example you could omit data=old. That is in all
good books on the S language ....
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list