# [R] Newbie help with ANOVA and lm.

rkevinburton at charter.net rkevinburton at charter.net
Sat Feb 27 16:53:07 CET 2010

```Would someone be so kind as to explain in English what the ANOVA code (anova.lm) is doing? I am having a hard time reconciling what the text books have as a brute force regression and the formula algorithm in 'R'. Specifically I see:

p <- object\$rank
if (p > 0L) {
p1 <- 1L:p
comp <- object\$effects[p1]
asgn <- object\$assign[object\$qr\$pivot][p1]
nmeffects <- c("(Intercept)", attr(object\$terms, "term.labels"))
tlabels <- nmeffects[1 + unique(asgn)]
ss <- c(unlist(lapply(split(comp^2, asgn), sum)), ssr)
df <- c(unlist(lapply(split(asgn, asgn), length)), dfr)
}
else {
ss <- ssr
df <- dfr
tlabels <- character(0L)
}
ms <- ss/df
f <- ms/(ssr/dfr)
P <- pf(f, df, dfr, lower.tail = FALSE)

I think I understand the check for 'p' being non-zero. 'p' is essentially the number of terms in the model matrix (including the intercept term if it exists). So in a mathematical description of a regression that included the intercept and one term (like dist ~ speed) you would have a model matrix of a column of '1's and then a column of data. The 'assign' would be a vector containing [0,1]. So then in finding the degrees of freedom you split the asssign matrix with itself. I am having a hard time seeing that this ever produces degrees of freedom that are different. So I get that the vector 'df' would always be something like [2,2,dfr]. But that is obviously wrong. Would someone care to elighten me on what the code above is doing?

Thank you.

Kevin

```