[R] Using predict.lm()
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Jun 17 16:25:30 CEST 2004
On Thu, 17 Jun 2004, Steven White wrote:
> Following the example in help(predict.lm):
> x <- rnorm(15)
> y <- x + rnorm(15)
> new <- data.frame(x = seq(-3, 3, 0.5))
> predict(lm(y ~ x), new)
> predicts the response elements corresponding to new$x as can be viewed by:
> lines(new$x,predict(lm(y ~ x), new))
Note that the model is fitted to `x' and new contains `x'. You haven't
> I am trying to extend this fitting and prediction over a variety of factors as
> ...where variable new simply substitutes a differing domain than old. When I
> try to predict on the frame new using x & y, I get a response that
> corresponds to the length of new:
> but when I use the same variables from within the frame old,
That you have not done correctly: see ?lm.
> the frame new is ignored:
No, it is not ignored but it does not contain a variable named `old$x' and
your workspace does. newdata is the first place to look for variables,
but not the only place.
> ...results in a response the length of old$x (presumably predicting over the
> values of old$x). Furthermore, this behavior also precludes using something
> more useful, i.e.:
> to return predictions over a number of factors over redefined domains. In my
> case, I am attempting to do 2nd order polynomial fitting over noisy data
> collected for a large number of factors (~85). The data were collected for
> each factor at convenient (and therefore dissimilar) points within a common
> domain, but I need to compare the responses of each factor at similar points
> within the common domain.
> I am obviously missing something here because I continue to be puzzled by the
> result. I had thought (perhaps erroneously) that lm() would return a model
> object that would permit prediction.
Indeed it does.
> ...results in:
> lm(formula = old$y ~ old$f/(1 + old$x) - 1)
> old$fFIRST old$fSECOND old$fFIRST:old$x old$fSECOND:old$x
> -0.08489 -0.05839 1.15351 0.72981
> which clearly provides a model fit for each factor, and identifies the factor
> from which each model coefficient was extracted, so lm() does provide the
> capability to predict over the factors. It seems however (as nearly as I can
> tell), that predict simply ignores the frame new altogether, failing even to
> provide a warning.
Nope. You just haven't set new to match your fit.
> Is this the intended behavior? Have I missed something very simple or have a
> fundamental misunderstanding of how this should work?
Yes, yes. You should be using
lm(y ~ f/(1+x)-1, data=old)
etc, although in your example you could omit data=old. That is in all
good books on the S language ....
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help