# [R-sig-ME] What is the maximum number of groups?

Andrew J Tyre atyre2 at unlnotes.unl.edu
Tue Jan 6 05:10:16 CET 2009

```Hi All,

I've been thinking alot about the earlier productive and helpful (for me!)
discussion on overdispersion and glmms etc. I haven't gotten back to my
grasshopper example yet, but I will as soon as the students complete
another component of the data collection.

I have just come across an example in which I was able to account for most
of the overdispersion - enough to make me happy anyway - but I'm worried
about what I did to do it. The data are the number of traps at a site on
one day that caught or did not catch a deer, so binomial data. There are
several sites sampled in several different years, and a couple of weather
covariates that are the same for all traps at a given site on a particular
day; there are no trap level covariates, but many days of trapping within
each site/year combination. We expect there to be variation among
site/year combinations related to the number of deer at a site, variation
among days due to weather (some covariate data) or other factors, and
variation among traps due to the location of the trap relative to the deer
population.

Some simulated data:

library(lme4)

# simulate some data
x = runif(300,1,10)
siteyear = factor(rep(1:10,each=30))
logit.p = x*0.2 - 0.025*x^2 + rep(rnorm(10,0,0.4),each=30) +
rnorm(300,0,0.2)
p = 1/(1+exp(-logit.p))
size = sample(5:15,300,replace=TRUE)
y = rbinom(300,prob=p,size=size)
siteday=factor(1:300)
mm.1 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteday),family=binomial)
mm.2 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteyear),family=binomial)
mm.3 = lmer(cbind(y,size-y)~x+I(x^2)+siteyear+(1|siteday),family=binomial)

mm.list = list(mm.1,mm.2,mm.3)
k = c(4,4,13)
ll = sapply(mm.list,logLik)
# use chat estimate to get quasiAIC
chat = deviance(mm.3)/300

qaic = -2*ll/chat+2*k
delta = qaic-min(qaic)
AIC.table=cbind(qaic,k,delta,w = exp(-(delta/2))/sum(exp(-delta/2)))
AIC.table
qaic  k    delta            w
ML 404.6663  4 78.66631 8.269964e-18
ML 340.3928  4 14.39277 7.487291e-04
ML 326.0000 13  0.00000 9.992513e-01
> chat
ML
1.106134

Now this is pretty close to the structure that I have with the real data.
All models had the (1|siteday) random effect, various combinations of
covariates of interest (x in the example), and then either did or did not
include siteyear as a fixed effect. In the real example, as with my
simulated example, the model with both a fixed effect of siteyear and a
random effect of siteday is the best model. The best model makes
biological sense. The question: is having as many groups as observations
an issue? It seems to work but ... I didn't start worrying about it until
afterwards. Does it work because there are multiple traps per siteday? Any
insights appreciated.

> sessionInfo()
R version 2.7.2 (2008-08-25)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
 stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
 lme4_0.999375-27   Matrix_0.999375-15 lattice_0.17-13

loaded via a namespace (and not attached):
 grid_2.7.2  nlme_3.1-89

Drew Tyre

School of Natural Resources
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974

phone: +1 402 472 4054
fax: +1 402 472 2946
email: atyre2 at unl.edu
http://snr.unl.edu/tyre

```