[R] summary[["r.squared"]] gives strange results

Wed Dec 7 12:07:47 CET 2005

Felix Flory wrote:
> I am simulating an ANOVA model and get a strange behavior from the
> summary function. To be more specific: please run the following code
> and see for yourself: the summary()[["r.squared"]] values of two
> identical models are quite different!!
> 
> ## 3 x 3 ANOVA of two factors x and z on outcome y
> s.size <- 300 # the sample size
> p.z <- c(0.25, 0.5, 0.25) # the probabilities of factor z
> ## probabilities of x given z
> p.x.z <- matrix(c(40/60, 20/120, 6/60,
>                   14/60, 80/120, 14/60,
>                    6/60, 20/120, 40/60), 3, 3, byrow = TRUE)
> ## the regression coefficients according to the model.matrix X, that
> ## is computed later
> beta <- c(140, -60, -50, -80, 120, 100, -40,  60,  50)/40
> ## the factor z and the factor x (in dependence of z)
> z <- x <- vector(mode = "integer", length = s.size)
> for(j in 1:s.size) {
>   z[j] <- sample(1:3, 1, prob = p.z)
>   x[j] <- sample(1:3, 1, prob = p.x.z[, z[j]])
> }
> ## constructing the model.matrix X
> zf <- factor(z)
> contrasts(zf) <- contr.treatment(nlevels(zf), base = 3)
> zm <- model.matrix(~ zf)
> xf <- factor(x)
> contrasts(xf) <- contr.treatment(nlevels(xf), base = 3)
> xm <- model.matrix(~ xf)
> X <- cbind(zm, zm * xm[,2], zm * xm[,3])
> ## the outcome y
> y <- X %*% beta + rnorm(s.size, 0, 4)
> ## the two linear models 
> lm.v1 <- lm(y ~ X -1)
> lm.v2 <- lm(y ~ zf * xf)
> ## which are equivalent
> anova(lm.v1, lm.v2)
> print(sd(model.matrix(lm.v1) %*% coef(lm.v1))^2 / sd(y)^2) - 
>   print(sd(model.matrix(lm.v2) %*% coef(lm.v2))^2 / sd(y)^2)
> ## so far everything is fine but why are the following r.squared
> ## values quite different?
> print(summary(lm.v1)[["r.squared"]]) -
>   print(summary(lm.v2)[["r.squared"]])
> 

Hi, Felix,

The first model fits your data without an intercept and thus has a 
different formula of R^2. The justification should be in any intro 
regression text. Here is the relevant snippet from summary.lm:

         mss <- if (attr(z$terms, "intercept"))
             sum((f - mean(f))^2)
         else sum(f^2)
         rss <- sum(r^2)
         <snip>
         ans$r.squared <- mss/(mss + rss)

Try the following to see a direct comparison of the two methods:

lm.v1 <- lm(y ~ X - 1)
lm.v2 <- lm(y ~ zf * xf)
lm.v3 <- lm(y ~ X[, -1])

summary(lm.v1)$r.squared
summary(lm.v2)$r.squared
summary(lm.v3)$r.squared

HTH,

--sundar