[R] Is this *always* the intended R^2 value for no intercept in lm?
Thierry Zell
th|erry@ze|| @end|ng |rom gm@||@com
Sat Nov 5 18:36:52 CET 2022
I am puzzled by the computation of R^2 with intercept omitted that is
already illustrated by the following example taken from help("lm")
## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
lm.D90 <- lm(weight ~ group - 1) # omitting intercept
The calculations for the R^2 for both models are consistent with the
help("summary.lm") description:
"y* is the mean of y[i] if there is an intercept and zero otherwise."
Which causes a dramatic difference in the resulting R^2 values.
r2.D9 <- summary(lm.D9)$r.squared
r2.D90 <- summary(lm.D90)$r.squared
all.equal(r2.D9, 0.0730775989903856) #TRUE
all.equal(r2.D90, 0.981783272435264) #TRUE
This is counter-intuitive to say the least since the two models have
identical predictions and both models could be described more
accurately as two intercepts rather than zero. I see three
possibilities:
1. This is the intended result, in which case no fix is required, but
I’d be curious to understand the argument better.
2. This is an unfortunate outcome but not worth fixing as the user can
easily compute the correct R^2. In this case, I'd suggest that this
unintuitive behavior should be explicitly called out in the
documentation.
3. This is a bug worth fixing.
I look forward to hearing the community’s opinion on this.
Thanks in advance!
More information about the R-help
mailing list