[R] partial residuals & the output of residuals.lm(..., type="partial")
Daniel McGlinn
daniel.mcglinn at okstate.edu
Thu Feb 19 03:57:18 CET 2009
Dear list,
After thinking about it a little more I solved my question of why I was
calculating different residuals when using
residuals.lm(...,type="partial") and when dropping a single term and
recalculating the residuals. This is because the two variables are in a
sense competing with one another in the full model if they are not
completely orthogonal to one another. For example, with my hypothetical
example from before if you make x1 and x2 and more correlated then the
discrepancy between the two sets of residuals increases, but the problem
can be solved if you make sure to use the same coefficients from the
full model when computing the raw residuals without the other variable.
Dan
Here is the example that shows that more correlated x-variables make the
problem even worse and a solution to my original question.
set.seed(12)
x1<-runif(100)
x2<-x1+runif(100) ##this will make x1 and x2 more strongly correlated
than in the first example (see original message)
y<-.13+.25*x1+.70*x2+runif(100)
mod<-lm(y~x1+x2)
plot(residuals(mod,type="partial")[,2],residuals(update(mod,.~.-x2),type='response'))
abline(0,1)
##note how the degree of scatter increases around the 1:1 line
##here is the solution to the problem
##calculate the residuals by hand and make sure to use the estimated
coefficients from the full model
calc.resids<-y-cbind(rep(1,100),x1)%*%coef(mod)[-3] ##as before I will
drop the influence of x2 from the model prediction
##center the calculated residuals
calc.resids<- calc.resids-mean(calc.resids)
plot(residuals(mod,type="partial")[,2],calc.resids)
abline(0,1)##now all the points fall right on the line
-------- Original Message --------
Subject: partial residuals & the output of residuals.lm(...,type="partial")
From: Daniel McGlinn <daniel.mcglinn at okstate.edu>
To: r-help at r-project.org <r-help at r-project.org>
Date: 2/18/2009 7:52 PM
> Dear list,
>
> I would like to know how the function residuals.lm calculates the
> partial residuals from an lm object with more than one predictor
> variable. In other words what is residuals.lm(...,type="partial") doing
> behind the scenes? According to the help file for residuals.lm
> (?residuals.lm), "The partial residuals are a matrix with each column
> formed by omitting a term from the model". Unfortunately, I cannot seem
> to recreate the results of the function "residuals.lm" by simply
> dropping a variable from a model and then calculating the raw residuals
> of the updated model. Can anyone see what I am overlooking? It may be
> helpful to others if I mention that the usage of
> residuals.lm(...,type='partial') by the function termplot is what
> motivated me to look at this function more closely. Below is a simple
> example to illustrate my question:
>
> set.seed(12)
>
> x1 <- runif(100)
> x2 <- runif(100)
>
> y <- .13+.25*x1+.70*x2+runif(100)
>
> mod <- lm(y~x1+x2)
>
> ##let's only consider the partial residuals when x2 is dropped from the
> model
> plot(residuals(mod,type="partial")[,2],residuals(update(mod,.~.-x2),type='response'))
> abline(0,1) ##1:1 line
> ##why do the points not all fall on the 1:1 line?
>
> Thanks,
> Dan
>
>
--
Daniel J. McGlinn
Department of Botany, Oklahoma State University
117 LSE Stillwater OK 74078 USA 405-612-1780
http://ecology.okstate.edu/Libra/
More information about the R-help
mailing list