[R] Newbie help with ANOVA and lm.

Peter Ehlers ehlers at ucalgary.ca
Sat Feb 27 17:55:41 CET 2010

On 2010-02-27 8:53, rkevinburton at charter.net wrote:
> Would someone be so kind as to explain in English what the ANOVA code (anova.lm) is doing? I am having a hard time reconciling what the text books have as a brute force regression and the formula algorithm in 'R'. Specifically I see:
>      p<- object$rank
>      if (p>  0L) {
>          p1<- 1L:p
>          comp<- object$effects[p1]
>          asgn<- object$assign[object$qr$pivot][p1]
>          nmeffects<- c("(Intercept)", attr(object$terms, "term.labels"))
>          tlabels<- nmeffects[1 + unique(asgn)]
>          ss<- c(unlist(lapply(split(comp^2, asgn), sum)), ssr)
>          df<- c(unlist(lapply(split(asgn, asgn), length)), dfr)
>      }
>      else {
>          ss<- ssr
>          df<- dfr
>          tlabels<- character(0L)
>      }
>      ms<- ss/df
>      f<- ms/(ssr/dfr)
>      P<- pf(f, df, dfr, lower.tail = FALSE)
> I think I understand the check for 'p' being non-zero. 'p' is essentially the number of terms in the model matrix (including the intercept term if it exists). So in a mathematical description of a regression that included the intercept and one term (like dist ~ speed) you would have a model matrix of a column of '1's and then a column of data. The 'assign' would be a vector containing [0,1]. So then in finding the degrees of freedom you split the asssign matrix with itself. I am having a hard time seeing that this ever produces degrees of freedom that are different. So I get that the vector 'df' would always be something like [2,2,dfr]. But that is obviously wrong. Would someone care to elighten me on what the code above is doing?

split(asgn, asgn) splits the vector (not matrix) 'asgn' into
list components. Then lapply() applies length() to each list
component which gives the associated degrees of freedom.
unlist() removes the list structure, producing a vector of dfs.
For simple regression, this results in c(1,1). The residual
dfs are then tacked on to give the df-vector df=c(1,1,dfr).
For models with an intercept the first component of df should
always be 1. But this is discarded in the output matrix.

With two numerical predictors: y ~ x1 + x2,
you should find that asgn = c(0,1,2) leading to df = c(1,1,1,dfr).

   -Peter Ehlers

> Thank you.
> Kevin
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Peter Ehlers
University of Calgary

More information about the R-help mailing list