# [R] between-within anova: aov and lme

Spencer Graves spencer.graves at pdf.com
Sat Aug 12 12:28:26 CEST 2006

```	  To understand why this works, you need to understand the math in a
more general formulation.  Ordinary least squares can be written in
matrix / vector notation as follows:

y = X %*% b + e,

where y and e are N x 1 vectors, X is an N x k matrix, and b is a k x 1
vector.  In this formulation, e follows a multivariate normal
distribution with mean 0 and covariance = s.e^2 times the N x N identity
matrix.

For mixed effects, e is assumed to follow a multivariate normal
distribution with a more general variance-covariance structure,
specified in various ways as discussed in Pinheiro and Bates (2000)
Mixed-Effects Models in S and S-Plus (Springer).  If e ~ N(0, W), then
the maximum likelihood estimates for "b" in the above model can be
written as follows:

b = inv(t(X) %*% solve(W, X)) %*% y.

As explained by Pinheiro and Bates, we estimate the fixed effects,
"b", using maximum likelihood (ML) and parameters in "W" using
"restricted maximum likelihood (REML)".

The standard analysis of variance is then obtained from the
"likelihood ratio" for nested models.  In certain special cases, a
monotonic transformation of a likelihood ratio follows an F distribution
with degrees of freedom computed from the ranks of various matrices.
The approach provides a unified way of analyzing data with mixed effects
that does not care if the design is balance or not.

Analyses following this method may not always give the same answers
as textbooks that discuss standard balanced designs.  However, I'm not
prepared to discuss that.

Hope this helps.
Spencer Graves

##############################################
William Simpson wrote:
> Hi Spencer
>
>> 	  'lme' is smart enough to figure out from the data whether a factor is
>> 'between' or 'within' or partially one or the other.  This allows you
>> avoid worrying about that during data analysis -- except as a check on
>> factor coding.
> Just to check Spencer, the following lme() statement:
> lme(y~a*b*c,random=~1|s, data=d)
> will work for any combination of a,b,c as between or within factors.
At one extreme
> a,b,c could all be between subjects, at the other extreme a,b,c could
all be within
> subjects, and any other combo of between/within.
>
> That is a bit mind-bending. So far as lme is concerned all that
matters is that s is
> a random effect. It will probably be difficult to convince experimental
> psychologists who consider themselves to be experts in the
statistical analysis of
> experiments.
>
> Cheers
> Bill
>
#################################
following with 'lme':

lme(response~A*B*C,random=~1|subject)

This assumes that A, B, and C are fixed effects, either continuous
variables or factors present at only a very few levels whose effects are
not reasonably modeled as a random sample from some other distribution.
It also assumes that the effect of each level of subject can be
reasonable modeled as a random adjustment to the intercept following a
common distribution with mean 0 and variance = 'var.subj'.

The function 'aov' is old and mostly obsoleted by 'nlme'.  There may
be things that can be done in 'aov' that can not be done more or less as
easily and usually better and more generally with 'lme', but I'm not
familiar with such cases.

Your question suggests you may not be familiar with Pinheiro and
Bates (2000) Mixed-Effects Models in S and S-Plus (Springer).  The
standard R distribution comes with a directory "~library\nlme\scripts"
containing script files 'ch01.R', 'ch02.R', ..., 'ch06.R', and 'ch08.R'.
These contain R script files with the R code for each chapter in the
book.  I've learned a lot from walking through the script files line by
line while reviewing the corresponding text in the book.  Doing so
protects me from problems with silly typographical errors as well as
subtle problems where the S-Plus syntax in the book gives a different
answer in R because of the few differences in the syntax between S-Plus
and R.

Hope this helps.
Spencer Graves

William Simpson wrote:
> I have 2 questions on ANOVA with 1 between subjects factor and 2 within factors.
>
> 1. I am confused on how to do the analysis with aov because I have seen two examples
> on the web with different solutions.
>
> a) Jon Baron (http://www.psych.upenn.edu/~baron/rpsych/rpsych.html) does
> 6.8.5 Example 5: Stevens pp. 468 - 474 (one between, two within)
>
> between: gp
> within: drug, dose
> aov(effect ~ gp * drug * dose + Error(subj/(dose*drug)), data=Ela.uni)
>
> b) Bill Venables answered a question on R help as follows.
>
> - factor A between subjects
> - factors B*C within subjects.
>
> aov(response ~ A*B*C + Error(subject), Kirk)
> "An alternative formula would be response ~ A/(B*C) + Error(subject), which
> would only change things by grouping together some of the sums of squares."
>
> -------------------------------------------------------
> SO: which should I do????
> aov(response ~ A*B*C + Error(subject), Kirk)
> aov(response ~ A/(B*C) + Error(subject), Kirk)
> aov(response ~ A*B*C + Error(subject/(B*C)), Kirk)
> --------------------------------------------------------
>
> 2. How would I do the analysis in lme()?
> Something like
> lme(response~A*B*C,random=~1|subject/(B*C))???
>
>
> Thanks very much for any help!
> Bill Simpson
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help