[R] odd behavior of summary()$r.squared
Sundar Dorai-Raj
sundar.dorai-raj at PDF.COM
Wed Oct 6 21:21:04 CEST 2004
J.R. Lockwood wrote:
> I may be missing something obvious here, but consider the following simple
> dataset simulating repeated measures on 5 individuals with pretty strong
> between-individual variance.
>
> set.seed(1003)
> n<-5
> v<-rep(1:n,each=2)
> d<-data.frame(factor(v),v+rnorm(2*n))
> names(d)<-c("id","y")
>
> Now consider the following two linear models that provide identical fitted
> values, residuals, and estimated residual variance:
>
> m1<-lm(y~id,data=d)
> m2<-lm(y~id-1,data=d)
> print(max(abs(fitted(m1)-fitted(m2))))
>
> The r-squared reported by summary(m1) appears to be correct in that it is
> equal to the squared correlation between the fitted and observed values:
>
> print(summary(m1)$r.squared - cor(fitted(m1),d$y)^2)
>
> However, the same is not true of m2.
>
> print(summary(m2)$r.squared - cor(fitted(m2),d$y)^2)
>
>
>>R.version
>
> _
> platform i686-pc-linux-gnu
> arch i686
> os linux-gnu
> system i686, linux-gnu
> status
> major 1
> minor 9.0
> year 2004
> month 04
> day 12
> language R
I think what you're trying to do is better accomplished by looking at
the anova table of the two results
a1 <- anova(m1)
a2 <- anova(m2)
r2.1 <- a1[1, 2]/sum(a1[, 2])
r2.2 <- a2[1, 2]/sum(a2[, 2])
summary(m1)$r.squared - r2.1
summary(m2)$r.squared - r2.2
The result you used above using "cor" still adjusts your data for the
grand mean, which m2 doesn't fit.
HTH,
--sundar
More information about the R-help
mailing list