[R] R-squared with and without constant

Tim Calkins
Wed Nov 22 00:00:52 CET 2006

Greetings Listers!

the R-squared value reported by summary of lm is calculated as

1 - RSS/RSS_m

where RSS_m is the residual sum of squares of a minimal model.  In
most cases, the minimal model is simply y = mean(y), but when a
constant is left out of the model, the minimal model is y = 0.
However, if you manually add a constant, R still considers y = 0 the
minimal model.  This also causes different F stats, DF, and p values.

Is there a way to specify that the R-squared should be calculated
using y = mean(y)?

Here's an example:
>  a <- rnorm(100,10,5)
>  b <- rnorm(100,10,5)
>  c <- rep(1,100)

>  summary(lm(a~b))

lm(formula = a ~ b)

     Min       1Q   Median       3Q      Max
-11.8677  -3.4442  -0.5625   4.1099  10.5102

            Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.23724    1.05256   9.726 4.76e-16 ***
b           -0.02942    0.09818  -0.300    0.765
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.799 on 98 degrees of freedom
Multiple R-Squared: 0.0009153,	Adjusted R-squared: -0.009279
F-statistic: 0.08978 on 1 and 98 DF,  p-value: 0.7651

>  summary(lm(a ~ b + c - 1)

lm(formula = a ~ b + c - 1)

     Min       1Q   Median       3Q      Max
-11.8677  -3.4442  -0.5625   4.1099  10.5102

  Estimate Std. Error t value Pr(>|t|)
b -0.02942    0.09818  -0.300    0.765
c 10.23724    1.05256   9.726 4.76e-16 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.799 on 98 degrees of freedom
Multiple R-Squared: 0.8146,	Adjusted R-squared: 0.8108
F-statistic: 215.3 on 2 and 98 DF,  p-value: < 2.2e-16

Thanks in advance.


Tim Calkins
0406 753 997

