[R-sig-ME] What is the maximum number of groups?

Andrew J Tyre atyre2 at unlnotes.unl.edu
Tue Jan 6 05:10:16 CET 2009

Hi All,

I've been thinking alot about the earlier productive and helpful (for me!) 
discussion on overdispersion and glmms etc. I haven't gotten back to my 
grasshopper example yet, but I will as soon as the students complete 
another component of the data collection. 

I have just come across an example in which I was able to account for most 
of the overdispersion - enough to make me happy anyway - but I'm worried 
about what I did to do it. The data are the number of traps at a site on 
one day that caught or did not catch a deer, so binomial data. There are 
several sites sampled in several different years, and a couple of weather 
covariates that are the same for all traps at a given site on a particular 
day; there are no trap level covariates, but many days of trapping within 
each site/year combination. We expect there to be variation among 
site/year combinations related to the number of deer at a site, variation 
among days due to weather (some covariate data) or other factors, and 
variation among traps due to the location of the trap relative to the deer 

Some simulated data:


# simulate some data
x = runif(300,1,10)
siteyear = factor(rep(1:10,each=30))
logit.p = x*0.2 - 0.025*x^2 + rep(rnorm(10,0,0.4),each=30) + 
p = 1/(1+exp(-logit.p))
size = sample(5:15,300,replace=TRUE)
y = rbinom(300,prob=p,size=size)
mm.1 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteday),family=binomial)
mm.2 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteyear),family=binomial)
mm.3 = lmer(cbind(y,size-y)~x+I(x^2)+siteyear+(1|siteday),family=binomial)

mm.list = list(mm.1,mm.2,mm.3)
k = c(4,4,13)
ll = sapply(mm.list,logLik)
# use chat estimate to get quasiAIC
chat = deviance(mm.3)/300

qaic = -2*ll/chat+2*k
delta = qaic-min(qaic)
AIC.table=cbind(qaic,k,delta,w = exp(-(delta/2))/sum(exp(-delta/2)))
       qaic  k    delta            w
ML 404.6663  4 78.66631 8.269964e-18
ML 340.3928  4 14.39277 7.487291e-04
ML 326.0000 13  0.00000 9.992513e-01
> chat

Now this is pretty close to the structure that I have with the real data. 
All models had the (1|siteday) random effect, various combinations of 
covariates of interest (x in the example), and then either did or did not 
include siteyear as a fixed effect. In the real example, as with my 
simulated example, the model with both a fixed effect of siteyear and a 
random effect of siteday is the best model. The best model makes 
biological sense. The question: is having as many groups as observations 
an issue? It seems to work but ... I didn't start worrying about it until 
afterwards. Does it work because there are multiple traps per siteday? Any 
insights appreciated.

> sessionInfo()
R version 2.7.2 (2008-08-25) 

LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base 

other attached packages:
[1] lme4_0.999375-27   Matrix_0.999375-15 lattice_0.17-13 

loaded via a namespace (and not attached):
[1] grid_2.7.2  nlme_3.1-89

Drew Tyre

School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974

phone: +1 402 472 4054 
fax: +1 402 472 2946
email: atyre2 at unl.edu

More information about the R-sig-mixed-models mailing list