[R] summary[["r.squared"]] gives strange results
Sundar Dorai-Raj
sundar.dorai-raj at pdf.com
Wed Dec 7 12:07:47 CET 2005
Felix Flory wrote:
> I am simulating an ANOVA model and get a strange behavior from the
> summary function. To be more specific: please run the following code
> and see for yourself: the summary()[["r.squared"]] values of two
> identical models are quite different!!
>
> ## 3 x 3 ANOVA of two factors x and z on outcome y
> s.size <- 300 # the sample size
> p.z <- c(0.25, 0.5, 0.25) # the probabilities of factor z
> ## probabilities of x given z
> p.x.z <- matrix(c(40/60, 20/120, 6/60,
> 14/60, 80/120, 14/60,
> 6/60, 20/120, 40/60), 3, 3, byrow = TRUE)
> ## the regression coefficients according to the model.matrix X, that
> ## is computed later
> beta <- c(140, -60, -50, -80, 120, 100, -40, 60, 50)/40
> ## the factor z and the factor x (in dependence of z)
> z <- x <- vector(mode = "integer", length = s.size)
> for(j in 1:s.size) {
> z[j] <- sample(1:3, 1, prob = p.z)
> x[j] <- sample(1:3, 1, prob = p.x.z[, z[j]])
> }
> ## constructing the model.matrix X
> zf <- factor(z)
> contrasts(zf) <- contr.treatment(nlevels(zf), base = 3)
> zm <- model.matrix(~ zf)
> xf <- factor(x)
> contrasts(xf) <- contr.treatment(nlevels(xf), base = 3)
> xm <- model.matrix(~ xf)
> X <- cbind(zm, zm * xm[,2], zm * xm[,3])
> ## the outcome y
> y <- X %*% beta + rnorm(s.size, 0, 4)
> ## the two linear models
> lm.v1 <- lm(y ~ X -1)
> lm.v2 <- lm(y ~ zf * xf)
> ## which are equivalent
> anova(lm.v1, lm.v2)
> print(sd(model.matrix(lm.v1) %*% coef(lm.v1))^2 / sd(y)^2) -
> print(sd(model.matrix(lm.v2) %*% coef(lm.v2))^2 / sd(y)^2)
> ## so far everything is fine but why are the following r.squared
> ## values quite different?
> print(summary(lm.v1)[["r.squared"]]) -
> print(summary(lm.v2)[["r.squared"]])
>
Hi, Felix,
The first model fits your data without an intercept and thus has a
different formula of R^2. The justification should be in any intro
regression text. Here is the relevant snippet from summary.lm:
mss <- if (attr(z$terms, "intercept"))
sum((f - mean(f))^2)
else sum(f^2)
rss <- sum(r^2)
<snip>
ans$r.squared <- mss/(mss + rss)
Try the following to see a direct comparison of the two methods:
lm.v1 <- lm(y ~ X - 1)
lm.v2 <- lm(y ~ zf * xf)
lm.v3 <- lm(y ~ X[, -1])
summary(lm.v1)$r.squared
summary(lm.v2)$r.squared
summary(lm.v3)$r.squared
HTH,
--sundar
More information about the R-help
mailing list