[R-sig-ME] What is the maximum number of groups?

Tue Jan 6 05:10:16 CET 2009

Hi All,

I've been thinking alot about the earlier productive and helpful (for me!) 
discussion on overdispersion and glmms etc. I haven't gotten back to my 
grasshopper example yet, but I will as soon as the students complete 
another component of the data collection. 

I have just come across an example in which I was able to account for most 
of the overdispersion - enough to make me happy anyway - but I'm worried 
about what I did to do it. The data are the number of traps at a site on 
one day that caught or did not catch a deer, so binomial data. There are 
several sites sampled in several different years, and a couple of weather 
covariates that are the same for all traps at a given site on a particular 
day; there are no trap level covariates, but many days of trapping within 
each site/year combination. We expect there to be variation among 
site/year combinations related to the number of deer at a site, variation 
among days due to weather (some covariate data) or other factors, and 
variation among traps due to the location of the trap relative to the deer 
population. 

Some simulated data:

library(lme4)

# simulate some data
x = runif(300,1,10)
siteyear = factor(rep(1:10,each=30))
logit.p = x*0.2 - 0.025*x^2 + rep(rnorm(10,0,0.4),each=30) + 
rnorm(300,0,0.2)
p = 1/(1+exp(-logit.p))
size = sample(5:15,300,replace=TRUE)
y = rbinom(300,prob=p,size=size)
siteday=factor(1:300)
mm.1 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteday),family=binomial)
mm.2 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteyear),family=binomial)
mm.3 = lmer(cbind(y,size-y)~x+I(x^2)+siteyear+(1|siteday),family=binomial)

mm.list = list(mm.1,mm.2,mm.3)
k = c(4,4,13)
ll = sapply(mm.list,logLik)
# use chat estimate to get quasiAIC
chat = deviance(mm.3)/300

qaic = -2*ll/chat+2*k
delta = qaic-min(qaic)
AIC.table=cbind(qaic,k,delta,w = exp(-(delta/2))/sum(exp(-delta/2)))
AIC.table
       qaic  k    delta            w
ML 404.6663  4 78.66631 8.269964e-18
ML 340.3928  4 14.39277 7.487291e-04
ML 326.0000 13  0.00000 9.992513e-01
> chat
      ML 
1.106134 

Now this is pretty close to the structure that I have with the real data. 
All models had the (1|siteday) random effect, various combinations of 
covariates of interest (x in the example), and then either did or did not 
include siteyear as a fixed effect. In the real example, as with my 
simulated example, the model with both a fixed effect of siteyear and a 
random effect of siteday is the best model. The best model makes 
biological sense. The question: is having as many groups as observations 
an issue? It seems to work but ... I didn't start worrying about it until 
afterwards. Does it work because there are multiple traps per siteday? Any 
insights appreciated.

> sessionInfo()
R version 2.7.2 (2008-08-25) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base 

other attached packages:
[1] lme4_0.999375-27   Matrix_0.999375-15 lattice_0.17-13 

loaded via a namespace (and not attached):
[1] grid_2.7.2  nlme_3.1-89

Drew Tyre

School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974

phone: +1 402 472 4054 
fax: +1 402 472 2946
email: atyre2 at unl.edu
http://snr.unl.edu/tyre