[R] [iso-8859-1] R-squared value for linear regression passing [iso-8859-1] through origin using lm()

Thu Oct 18 17:25:04 CEST 2007

On Thu, 18 Oct 2007, Ralf Goertz wrote:

> S Ellison, Donnerstag, 18. Oktober 2007:
>>> I think there is reason to be surprised, I am, too. ...
>>> What am I missing?
>>
>> Read the formula and ?summary.lm more closely. The denominator,
>>
>> Sum((y[i]- y*)^2)
>>
>> is very large if the mean value of y is substantially nonzero and y*
>> set to 0 as the calculation implies for a forced zero intercept.
>
> But in that case the numerator is very large, too, isn't it? I don't
> want to argue, though. You might very well be right. But so far, I have
> not managed to create a dataset where R^2 is larger for the model with
> forced zero intercept (although I have not tried very hard). It would be
> very convincing to see one (Etienne?)
>

Consider the data set
     (a+1, a+1)
     (a+2, a+2)
     (a+3, a+2)

For any a>0 line with zero intercept will have residual mean square less than 1 (in fact, close to 0.5), so the residual sum of squares is less than 3. The sum of squares around zero is about 3a^2, so the r^2 for the zero-intercept model is  more than 1-1/a^2.

The r^2 for the model with intercept does not depend on a: it is 0.75.

         -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle