[R-sig-ME] Minimum number of levels for mixed model

Ben Bolker bbolker at gmail.com
Sat Feb 9 02:23:51 CET 2013


nrm2010 <nrm2010 at ...> writes:

>  Hello, Ben, Thank you for the response.  I created some confusion
> by stating treatment (trt) instead of the treatment blocks, of which
> there are 3.  The Murtaugh paper seems to take one position on the
> perhaps philosophical issue previously discussed on the forum
> concerning whether or not the model design has to be faithful to the
> experimental design.

  It's not going to work very well to take treatment (blocks)
as a random effects, for the various reasons enumerated in the
FAQ.  I would strongly advise modeling them as fixed effects.
 
> My larger question is how often it will be feasible to use mixed
> models with nested effects if we require a minimum of 5^n samples
> for n levels and we try to be faithful to the experimental design.

  It took me a minute, but I guess by "n" here you mean the number
of *hierarchical* levels?  (I initially took it as the number of
levels of each random factor ... one of the difficulties with
mixed models is the terminology ...)
 
> Thinking of the adage that all models are wrong and some are useful,
>  how wrong are we if the random variable has 3 or 4 levels rather
>  than 5, and how useful are we if we require 5^n samples?  

Again, this is discussed at some length in the FAQ; my personal
philosophical point of view probably comes through there.  I can say from
a basis of experience and guessing (very few rigorous proofs, sorry)
that if you try to fit multilevel models with fewer than 5 :

* sometimes the model will produce an error
* lots of times you will get estimates of zero variance.  
  * this _might_ represent bias in the estimator, or it might 
represent a weird distribution of the estimator, which might have
the right mean but a big spike at zero and a long tail.
* I don't have strong evidence for this, but it seems much
more likely that the optimization will fail *silently* and
give you wonky answers

125 samples is a big number in some fields, it's a small number
in other fields.  Maybe mixed models _aren't_ useful in your field ...
The fundamental problem, which I think you're going to have trouble
getting around, is that it's very hard to estimate variances reliably
from that few samples.  An analogy would be complaining that you're
having a hard time estimating population means reliably from samples
of size 2 or 3 ...

Remember, also, that the problem is primarily with the top level.
As I hope I made clear previously, the number of 'samples' we
are referring to for nested models is the total number of exchangeable
levels -- for a three level nested 5/5/5 model, we will have 5
top-level, 25 middle-level, and 125 bottom-level units.  Of course,
if you want to use crossed random effects, you tend to have more
"top-level" units (i.e. more variances to estimate from small
samples -- e.g. 5 plots x 5 years x 10 samples per year =
5 samples for among-plot variance, 5 for among-year variance,
25 for the plot-year interaction, and 250 overall ...)

I put together some little sims illustrating the issue: 
http://rpubs.com/bbolker/4187



More information about the R-sig-mixed-models mailing list