[Rd] Rsquared bug lm() (PR#10516)

Thomas Lumley tlumley at u.washington.edu
Sat Dec 15 00:24:35 CET 2007


This is deliberate and as documented in ?summary.lm. It is not a bug.

     -thomas

On Fri, 14 Dec 2007 lieven.clement at gmail.com wrote:

> Full_Name: lieven clement
> Version:  R version 2.4.0 Patched (2006-11-25 r39997)
> OS: i486-pc-linux-gnu
> Submission from: (NULL) (157.193.193.180)
>
>
> summary.lm() does not calculate R² accurately for models without intercepts if
> one of the predictor variables is a factor.
> In order to avoid one of the factor levels to be considered as a reference class
> you can use the -1 option in a formula. When you use this, R² is not correctly
> calculated.
>
>>  x1<-rnorm(100)
>> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25))
>> y<-10*x1+x2+rnorm(100,0,4)
>> x2<-as.factor(x2)
>> lmtest<-lm(y~-1+x1+x2)
>> summary(lmtest)$r.sq
> [1] 0.9650201
>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
> [1] 0.9342672
>
> The R squared by summary is calculated as
>> 1-sum(lmtest$res^2)/sum((y)^2)
> [1] 0.9650201
> apparently because lm.summary assumes the mean of y to be zero.
>
> In case of an intercept model everything seems ok
>> lmtest<-lm(y~x1+x2)
>> summary(lmtest)$r.sq
> [1] 0.9342672
>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
> [1] 0.9342672
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-devel mailing list