[R] R-squared value for linear regression passing through origin using lm()

Thu Oct 18 16:17:38 CEST 2007

S Ellison, Donnerstag, 18. Oktober 2007:
> >I think there is reason to be surprised, I am, too. ...
> >What am I missing?
> 
> Read the formula and ?summary.lm more closely. The denominator,
> 
> Sum((y[i]- y*)^2) 
> 
> is very large if the mean value of y is substantially nonzero and y*
> set to 0 as the calculation implies for a forced zero intercept.

But in that case the numerator is very large, too, isn't it? I don't
want to argue, though. You might very well be right. But so far, I have
not managed to create a dataset where R^2 is larger for the model with
forced zero intercept (although I have not tried very hard). It would be
very convincing to see one (Etienne?)

> In effect, the calculation provides the fraction of sum of squared
> deviations from the mean for the case with intercept, but the fraction
> of sum of squared y ('about' zero) for the non-intercept case. 

I understand the mathematics behind it. But as I said, I thought the
growth of the denominator is more than fully balanced by the growth of
the numerator.

> This is surprising if you automatically assume that better R^2 means
> better fit. I guess that explains why statisticians tell you not to use
> R^2 as a goodness-of-fit indicator.

IIRC, I have not been told so. Perhaps my teachers were not as good they
should have been. So what is R^2 good if not to indicate the goodness of
fit?.

Ralf