[Rd] termplot & predict.lm. some details about calculating predicted values with "other variables set at the mean"
Paul Johnson
pauljohn32 at gmail.com
Wed Dec 14 07:30:33 CET 2011
I'm making some functions to illustrate regressions and I have been
staring at termplot and predict.lm and residuals.lm to see how this is
done. I've wondered who wrote predict.lm originally, because I think
it is very clever.
I got interested because termplot doesn't work with interactive models:
> m1 <- lm(y ~ x1*x2)
> termplot(m1)
Error in `[.data.frame`(mf, , i) : undefined columns selected
Digging into that, I realized some surprising implications of
nonlinear formulas.
This issue arises when there are math functions in the regression
formula. The question focuses on what we mean by the mean of "x" when
we are discussing predictions and deviations.
Suppose one fits:
m1 <- lm (y ~ x1 + log(x2), data=dat)
I had thought the partial residual was calculated with reference to
the log of the mean of x2. But that's not right. It is calculated with
reference to mean(log(x2)). That seems misleading, termplot shows a
graph illustrating the effect of x2 on the horizontal axis (not
"log(x2)"). I should not say misleading. Rather, it is unexpected.
I think users who want the reference value in the plot of x2 to be the
mean of x2 have a legitimate concern here.
With a more elaborate formula, the mismatch gets more confusing.
Suppose the regression formula is
m2 <- lm (y ~ x1 + poly(x2,3), data=dat)
The model frame has these variables:
y x1 poly(x2, 3).1 poly(x2, 3).2 poly(x2, 3).3
and the partial residual calculation for variable x1, which I had
expected would be based on a polynomial transformation of mean(x2), is
the weighted sum of the means of the 3 polys.
Can you help me see this more clearly? (Or less wrongly?)
Perhaps you think I don't understand partial residuals in termplot,
but I am pretty sure I do. I made notes about it. See slides 54 and
55 in here: http://pj.freefaculty.org/guides/Rcourse/regression-tableAndPlot-1/regression-tableAndPlot.pdf
--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
More information about the R-devel
mailing list