[R-sig-ME] unbalanced data in nested lmer model

Ben Bolker bolker at ufl.edu
Mon Mar 29 22:25:12 CEST 2010


Luca Borger wrote:

[Jana Bürger:]
>> Moreover I don't understand your argument that fitting random efects with 
>> less than 5 levels was dodgy, as often examples in the books have 3 
>> samples from one beach, or 3 laboratory workers within one laboratory. 
>> These are less than 5 levels, are they not?
> 
> These are usually toy datasets to exemplify how the approach works, I do not 
> think they make a claim that the resulting variance estimates are very 
> reliable (think in the Zuur etal. mixed effects book you can find more 
> realistic examples, if I remember well). Plus, "level" refers to the number 
> of beaches or the number of labs etc. and the resulting variance estimates - 
> if less than say 5 it appears that you might be better off fitting it as a 
> fixed effect and not trying to decompose the variance into between labs and 
> within labs etc. Anyway, just my 2 cents and hope I explained this 
> correctly... 
> 
> See also the wiki page set up by Ben Bolker:
> http://glmm.wikidot.com/faq
> 
> e.g. you might be interested in this entry therein:
> 
> Zero or very small random effects variance estimates;
> (...)
> Very small variance estimates, or very large correlation estimates, often 
> indicates unidentifiability/lack of data (either due to exact 
> identifiability [e.g. designs that are not replicated at an important level] 
> or weak identifiable (designs that would be workable with more data of the 
> same type)

  I just added this to the FAQ:

Should I treat factor xxx as fixed or random?

This is in general a far more difficult question than it seems on the
surface. There are many competing philosophies and definitions (see
Gelman 2xxx). One point of particular relevance to 'modern' mixed model
estimation (rather than 'classical' method-of-moments estimation) is
that, for practical purposes, there must be a reasonable number of
random-effects levels (e.g. blocks) — more than 5 or 6 at a minimum.

    e.g., from Crawley (2002) p. 670: "Are there enough levels of the
factor in the data on which to base an estimate of the variance of the
population of effects? No, means [you should probably treat the variable
as] fixed effects."

Some researchers (who treat fixed vs random as a philosophical rather
than a pragmatic decision) object to this approach.

Treating factors with small numbers of levels as random will in the best
case lead to very small estimates of random effects; in the worst case
it will lead to various numerical difficulties such as lack of
convergence, zero variance estimates, etc.. In the classical
method-of-moments approach these problems do not arise (because the sums
of squares are always well defined as long as there are at least two
units), but the underlying problems of lack of power are there nevertheless.

   (Contributions welcome!)

-- 
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / people.biology.ufl.edu/bolker
GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc




More information about the R-sig-mixed-models mailing list