[R-sig-ME] GLMMs with unequal group sizes

Thu Jun 11 05:41:52 CEST 2009

Hello All,

I would like to use GLMMs with a binary response variable (logit link) to 
model the effects of three environmental covariates on whether resource 
units were used or unused by a wildlife species.  I have 15 different study 
areas, and very different numbers of used and unused units in each.  I'm 
interested in using fixed effects parameters estimates to predict the 
relative probabilities that resource units will be used across the entire 
population of study areas.  Numbers of used and unused units in each area 
look something like this:

Area    Unused    Used
01        281        2
02        4415      1
03        343        30
04        256        1
05        2052      4
06        4050      1
07        238        2
08        743        3
09        2476      18
10        2524      1
11        805        1
12        754        4
13        272        1
14        52          1
15        124        1

I've been using study area as a grouping factor for a random intercept and 
random slope effects:

fullmodel<-glmer(Used~1+x1+x2+x3+(1+x1+x2+x3|Area), family=binomial, 
data=mydata)

Using 'glmer', I've been able to fit models to my data without convergence 
issues, model fit is pretty good, and the results seem to make sense.  My 
questions are:  Given that the number of used units in each area are very 
unbalanced, to what degree can I generalize across the entire population of 
study areas?  Will my estimates for the fixed effects parameters be so 
reliant on areas 3 and 9 that I'm really just limited to inferences on these 
two areas?  Is there a way to quantify the relative weight of each study 
area in the estimation of the fixed effects parameters (i.e. the degree to 
which I can generalize across the entire population of study areas)?

I've read of borrow strength, which will certainly play a big role with this 
dataset, but I haven't found any examples of datasets that are as unbalanced 
as mine.

I realize that my questions relate to mixed models in general and less to 
their implementation in R, so I hope I'm not out-of-line in posting these 
questions here.  I'd guess there are probably answers to these questions in 
the literature, so I'd truly appreciate any advice on where I should look 
for more info.

Thanks in advance,

-Grant