[Rd] Rsquared bug lm() (PR#10516)
lieven.clement at gmail.com
lieven.clement at gmail.com
Fri Dec 14 14:10:25 CET 2007
Full_Name: lieven clement
Version: R version 2.4.0 Patched (2006-11-25 r39997)
OS: i486-pc-linux-gnu
Submission from: (NULL) (157.193.193.180)
summary.lm() does not calculate R² accurately for models without intercepts if
one of the predictor variables is a factor.
In order to avoid one of the factor levels to be considered as a reference class
you can use the -1 option in a formula. When you use this, R² is not correctly
calculated.
> x1<-rnorm(100)
> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25))
> y<-10*x1+x2+rnorm(100,0,4)
> x2<-as.factor(x2)
> lmtest<-lm(y~-1+x1+x2)
> summary(lmtest)$r.sq
[1] 0.9650201
> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
[1] 0.9342672
The R squared by summary is calculated as
> 1-sum(lmtest$res^2)/sum((y)^2)
[1] 0.9650201
apparently because lm.summary assumes the mean of y to be zero.
In case of an intercept model everything seems ok
> lmtest<-lm(y~x1+x2)
> summary(lmtest)$r.sq
[1] 0.9342672
> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
[1] 0.9342672
More information about the R-devel
mailing list