[R-sig-ME] Unbalance design in GLMM

Sun Feb 3 06:04:24 CET 2013

Gabriela Agostini <gabrielaagostini18 at ...> writes:

> differences in amphibian malformations that occur in several ponds
> located in two different areas.
> The random effects are sampled day (samplday) and pond identity
> (pondident).The fixed effects are area (studyarea) and species (sp).
> Ymat is the response variable.
> 
> > class(data$pondident)
> [1] "factor
> > class(data$samplday)
> [1] "integer"

 Try making samplday a factor ... In fact, your error is the
second one listed under http://glmm.wikidot.com/faq#errors , and
making the grouping variables a factor is the suggested remedy.
> 
> > levels(data$pondident)
>  [1] "A"     "arro"  "B"     "C"     "campo" "D"     "E"     "F"     "G"
> [10] "hum"
> >levels(data$samplday)
> NULL

 [snip]

 Lack of balance should not be a problem for GLMMs, unless it's
extreme (e.g. some completely missing combinations of fixed effects,
or all zeros or ones in some random-effect levels, i.e. 
complete separation).  In fact, unbalanced designs are one 
reason that people use 'modern' mixed models rather than
classical method-of-moments ANOVA (which has a hard time
with lack of balance).

> as you notice, it is an unbalanced design, so When I run the model
> 
> > GLMM.c<-lmer(Ymat~studyarea+sp+studyarea*sp+(1|samplday/pondident),
>   data=data,family="binomial")

 By the way, studyarea+sp+studyarea*sp is redundant (although
harmless).   Either

studyarea+sp+studyarea:sp  (main effects + interaction) or
studyarea*sp               (ditto, shorthand) 

should be sufficient

> Error: length(f1) == length(f2) is not TRUE
> Además: Mensajes de aviso perdidos
> 1: In pondident:samplday :
>   expresión numérica tiene 400 elementos: solo el primero es utilizado
> 2: In pondident:samplday :
>   expresión numérica tiene 400 elementos: solo el primero es utilizado