[R] R-squared value for linear regression passing through origin using lm()

Thu Oct 18 14:11:55 CEST 2007

Achim Zeileis, Donnerstag, 18. Oktober 2007:
> On Thu, 18 Oct 2007, Toffin Etienne wrote:
> 
> > Hi,
> > A have small technical question about the calculation of R-squared
> > using lm().
> > In a study case with experimental values, it seems more logical to
> > force the regression line to pass through origin with lm(y ~ x +0).
> > However, R-squared  values are higher in this case than when I
> > compute the linear regression with lm(y ~ x).
> > It seems to be surprising to me: is this result normal ? Is there any
> > problem in the R-squared value calculated in this case ?
> 
> Have you considered reading the documentation? ?summary.lm has
> 
>   r.squared: R^2, the 'fraction of variance explained by the model',
> 
>                 R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),
> 
>             where y* is the mean of y[i] if there is an intercept and
>             zero otherwise.

I think there is reason to be surprised, I am, too. The fraction of
variance explained should never be smaller when there are two values to
fit the data to. Of course, if mean(y)=0 anyway there should be no
difference in R^2 (except that the error df of the two models differ). 

What am I missing?

Ralf