[R] predict.lm(...,type="terms") question

Sun Sep 2 19:07:01 CEST 2012

Thank you all. My muddle about predict.lm(..., type = "terms") was evident even in my first sentence of my original posting 

> How can I actually use the output of 
> predict.lm(..., type="terms") to predict 
> new term values from new response values?

the answer being that I cannot; that new response values, if included in newdata, will simply be ignored by predict.lm, as well they should.

As for the calibration issue, I am reviewing literature now as suggested.

Though predict.lm performed to spec (no bug), may I suggest a minor
change to ?predict.lm text?

Existing: 
 newdata  An optional data frame in which to 
          look for variables with  
          which to predict. If omitted, 
          the fitted values are used.
Proposed: 
 newdata  An optional data frame in which to 
          look for new values of terms with 
          which to predict. If omitted, the 
          fitted values are used.

-John Thaden, Ph.D.
 College Station, TX

--- On Sun, 9/2/12, peter dalgaard <pdalgd at gmail.com> wrote:

> From: peter dalgaard <pdalgd at gmail.com>
> Subject: Re: [R] predict.lm(...,type="terms") question
> To: "David Winsemius" <dwinsemius at comcast.net>
> Cc: "Rui Barradas" <ruipbarradas at sapo.pt>, r-help at r-project.org, "jjthaden" <jjthaden at flash.net>
> Date: Sunday, September 2, 2012, 1:35 AM
> 
> On Sep 2, 2012, at 03:38 , David Winsemius wrote:
> 
> > 
> > Why should predict not complain when it is offered a
> newdata argument that does no contain a vector of values for
> "x"? The whole point of the terms method of prediction is to
> offer estimates for specific values of items on the RHS of
> the formula. The OP seems to have trouble understanding that
> point. Putting in a vector with the name of the LHS item
> makes no sense to me. I certainly cannot see that any
> particular behavior for this pathological input is described
> for predict.lm in its help page, but throwing an error seems
> perfectly reasonable to me.
> 
> Yes. Lots of confusion going on here. 
> 
> First, data= is _always_ used as the _first_ place to look
> for variables, if things are not in it, search continues
> into the formula's environment. To be slightly perverse,
> notice that even this works:
> 
> > y <- rnorm(10)
> > x <- rnorm(10)
> > d <- data.frame(z=rnorm(9))
> > lm(y ~ x, d)
> 
> Call:
> lm(formula = y ~ x, data = d)
> 
> Coefficients:
> (Intercept)            x 
> 
>     -0.2760   
>    0.2328  
> 
> Secondly, what is predict(..., type="terms") supposed to
> have to do with inverting a regression equation? That's just
> not what it does, it only splits the prediction formula into
> its constituent terms.
> 
> Thirdly; no, you do not invert a regression equation by
> regressing y on x. That only works if you can be sure that
> your new (x, y) are sampled from the same population as the
> data, which is not going to be the case if you are fitting
> to data with, say, selected equispaced x values. There's a
> whole literature on how to do this properly, Google e.g.
> "inverse calibration" for enlightenment.  
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk 
> Priv: PDalgd at gmail.com
> 
> 
> 
> 
> 
> 
> 
> 
>