[R] DF in LME

Thu Jul 3 00:08:17 CEST 2003

Federico Calboli <f.calboli at ucl.ac.uk> writes:

> Dear All,
> 
> I know I am quite obsessive and downright annoying (I apologize about that,
> but it's the way I am), but I would like to get my understanding of the way
> nlme calculates degrees of freedom straight.
> 
> For instance, on page 91 in Pinheiro and Bates (2000), on the examle of
> anova(fm2Machie), how is the sum of DF *Pi* corresponding to the terms
> estimated at level *i* calculated? In the example presenting
> anova(fm2Machine), *P1 = 0*. I just fail to see why. Same thing for *P2 =
> 2* (although this seems intuitive, but intuitive could be miles off the
> real reason) and *P3 = 0*.

The $p_i$ are the number of degrees of freedom of fixed-effects terms
estimated at level $i$.  For the model fit to the Machines data, level
0 is the intercept (1 d.f.) and the only other fixed effect is for
Machine.  That factor has 3 levels and 2 d.f. when the model also
contains the intercept.

There are 3 possible levels, which we number starting with the level
that has the largest groups.  (Note that this is the reverse of the
numbering of the levels in the multilevel modeling literature.)  So
our level 1 would be Worker, level 2 is Machine %in% Worker, and level
3 is individual observations.   This is why the counts of the numbers
of groups are $m_1=6$, $m_2=18$ and $m_3=54$.

The Machine term varies within the groups determined by Worker but
does not vary within the groups determined by "Machine %in% Worker".
Hence it is estimated at level 2 (in our notation) from which you get
the corresponding degrees of freedom.

I should point out that the F- and t-tests are approximate tests at
best.  Both the random effects parameters and the fixed effects
parameters are being estimated in these models.  To get F-tests we are
conditioning on the values of the parameters determining the
variance-covariance of the random effects.  There are good reasons to
do this (the random-effects and the fixed-effects parameters are
asymptotically uncorrelated) but the tests are still based on an
approximation.  

>From a practical point of view, if the degrees of freedom are
reasonably large then they do not need to be precisely determined.

> Incidentally, should I change the grouping, putting *machine* outside and
> *worker* inside, would anything change?

Yes.  Try it and see.

> A second thing: is there any substantial difference between the classical
> decomposition of DF for an ANOVA and the method used by lme for the
> interaction between a fixed and a random effect, in case the random
> variable is nominal and the fixed one continuous?

As far as I know the classical decompositions become very difficult to
formulate for unbalanced data.  The theory of partitioning the
response space into orthogonal subspaces is very elegant but can be
easily upset by lack of balance.  I tell my students that methods that
only work with balanced data are interesting from a theoretical point
of view but not from a practical point of view.  Observational data is
almost always unbalanced and even data from a balanced, designed
experiment frequently ends up being unbalanced because of missing
data.

-- 
Douglas Bates                            bates at stat.wisc.edu
Statistics Department                    608/262-2598
University of Wisconsin - Madison        http://www.stat.wisc.edu/~bates/