[R-sig-ME] What is the maximum number of groups?

Tue Jan 6 19:55:57 CET 2009

Right - I hadn't thought about it that way, but I think a 
"binomial-lognormal" is correct:

log(p/(1-p) = X*Beta + epsilon

episilon~N(0,sigma)

The summary says I have 300 groups and 300 observations, so the random 
effects vector is the same size as the dataset. If I attempt to include 
the "siteyear" level variation as a random effect I get the message 

Error in mer_finalize(ans) : q = 310 > n = 300

which is the error described in the previous posts. So its working here 
because I only have the individual level random effect. 

cheers,

Drew Tyre

School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974

phone: +1 402 472 4054 
fax: +1 402 472 2946
email: atyre2 at unl.edu
http://snr.unl.edu/tyre

Ben Bolker <bolker at ufl.edu> 
Sent by: r-sig-mixed-models-bounces at r-project.org
01/06/2009 09:13 AM

To
Andrew J Tyre <atyre2 at unlnotes.unl.edu>, R Mixed Models 
<r-sig-mixed-models at r-project.org>
cc

Subject
Re: [R-sig-ME] What is the maximum number of groups?

  If I understand what you're doing correctly, you're setting
up an "individual-level" covariate, essentially fitting
a binomial-normal model (or binomial-logitnormal).  That makes sense,
although I'm a bit surprised it worked -- in some versions of lme4 it
has triggered an error message about too many random effect parameters.
For example see

https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q1/000603.html

and following discussion.

  Ben Bolker

Andrew J Tyre wrote:
> Hi All,
> 
> I've been thinking alot about the earlier productive and helpful (for 
me!) 
> discussion on overdispersion and glmms etc. I haven't gotten back to my 
> grasshopper example yet, but I will as soon as the students complete 
> another component of the data collection. 
> 
> I have just come across an example in which I was able to account for 
most 
> of the overdispersion - enough to make me happy anyway - but I'm worried 

> about what I did to do it. The data are the number of traps at a site on 

> one day that caught or did not catch a deer, so binomial data. There are 

> several sites sampled in several different years, and a couple of 
weather 
> covariates that are the same for all traps at a given site on a 
particular 
> day; there are no trap level covariates, but many days of trapping 
within 
> each site/year combination. We expect there to be variation among 
> site/year combinations related to the number of deer at a site, 
variation 
> among days due to weather (some covariate data) or other factors, and 
> variation among traps due to the location of the trap relative to the 
deer 
> population. 
> 
> Some simulated data:
> 
> library(lme4)
> 
> # simulate some data
> x = runif(300,1,10)
> siteyear = factor(rep(1:10,each=30))
> logit.p = x*0.2 - 0.025*x^2 + rep(rnorm(10,0,0.4),each=30) + 
> rnorm(300,0,0.2)
> p = 1/(1+exp(-logit.p))
> size = sample(5:15,300,replace=TRUE)
> y = rbinom(300,prob=p,size=size)
> siteday=factor(1:300)
> mm.1 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteday),family=binomial)
> mm.2 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteyear),family=binomial)
> mm.3 = 
lmer(cbind(y,size-y)~x+I(x^2)+siteyear+(1|siteday),family=binomial)
> 
> mm.list = list(mm.1,mm.2,mm.3)
> k = c(4,4,13)
> ll = sapply(mm.list,logLik)
> # use chat estimate to get quasiAIC
> chat = deviance(mm.3)/300
> 
> qaic = -2*ll/chat+2*k
> delta = qaic-min(qaic)
> AIC.table=cbind(qaic,k,delta,w = exp(-(delta/2))/sum(exp(-delta/2)))
> AIC.table
>        qaic  k    delta            w
> ML 404.6663  4 78.66631 8.269964e-18
> ML 340.3928  4 14.39277 7.487291e-04
> ML 326.0000 13  0.00000 9.992513e-01
>> chat
>       ML 
> 1.106134 
> 
> Now this is pretty close to the structure that I have with the real 
data. 
> All models had the (1|siteday) random effect, various combinations of 
> covariates of interest (x in the example), and then either did or did 
not 
> include siteyear as a fixed effect. In the real example, as with my 
> simulated example, the model with both a fixed effect of siteyear and a 
> random effect of siteday is the best model. The best model makes 
> biological sense. The question: is having as many groups as observations 

> an issue? It seems to work but ... I didn't start worrying about it 
until 
> afterwards. Does it work because there are multiple traps per siteday? 
Any 
> insights appreciated.
> 
>> sessionInfo()
> R version 2.7.2 (2008-08-25) 
> i386-pc-mingw32 
> 
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
> States.1252;LC_MONETARY=English_United 
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base 
> 
> other attached packages:
> [1] lme4_0.999375-27   Matrix_0.999375-15 lattice_0.17-13 
> 
> loaded via a namespace (and not attached):
> [1] grid_2.7.2  nlme_3.1-89
> 
> Drew Tyre
> 
> School of Natural Resources
> University of Nebraska-Lincoln
> 416 Hardin Hall, East Campus
> 3310 Holdrege Street
> Lincoln, NE 68583-0974
> 
> phone: +1 402 472 4054 
> fax: +1 402 472 2946
> email: atyre2 at unl.edu
> http://snr.unl.edu/tyre
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models