[R-sig-ME] Model fit with a Poisson, cross-validation?

Ben Bolker bolker at ufl.edu
Thu May 28 21:38:28 CEST 2009


   I'm not sure I would call this "cross-validation" (unless I
misunderstand what you're trying to do). CV usually means re-fitting the
model with one or more data points held out each time to see how much
the results vary. Andrew Gelman talks a lot in (Gelman and Hill 2006)
about "posterior predictive checks", which may be close to what you have
in mind.
  Depending on what you want to do, the raw material would normally
be provided by the simulate() method for a fitted GLMM (mer) object,
but I think it doesn't work with the current released version of
lme4 -- there is one in the "allcoef" branch.  An alternative
is to download
<http://glmm.wikidot.com/local--files/basic-glmm-simulation/glmmfuns.R>
and use the my.mer.sim() function to simulate from the fitted model.

  For what it's worth, your description of your fitting process
sounds sensible.

  good luck,
    Ben Bolker

Lindsay Reynolds wrote:
> Hello List,
> 
> I am in the process of learning mixed models in R and have a basic 
> question. I am currently working on a model selection analysis with a 
> suite of mixed models and Poisson-distributed count data. After reading 
> Bolker et al 2009 (Trends In Ecology & Evolution 24:127-135) and having 
> a basic understanding of standard model selection analysis (Burnam & 
> Anderson) I was convinced that I could use the AICc alone to determine 
> the best models. However, it has been suggested to me that I also 
> include some sort of "R^2" value in my analysis to measure absolute fit 
> of the model to the data. Since this does not exist for mixed models 
> with Poisson distributed data, is was further suggested that I try 
> cross-validating my models by building a predicted data set that I could 
> compare to my observed data set.Can anyone point me to references who 
> have done this sort of thing with mixed models in R? I would be much 
> obliged.
> 
> More details on my analysis:
> My data are counts of trees established per year within 'site'. I have 
> built several models that include various combinations of climate 
> variables as fixed explanatory variables and all models have 'site' as a 
> random effect. In every model I include a continuous predictor variable 
> called 'year' that accounts for the fact that we expect there to always 
> be more young trees than older trees due to natural mortality. (year = 
> 1,2,3... n). I have tested for overdispersion using penalized, weighted 
> residual sum of squares (pwrss) divided by the number of observations: 
> pwrss/n. The values range between 0.9 and 2. I have interpreted this as 
> my data are not too overdispersed so I have continued with using the 
> Poisson distribution in my models. Also, I have run all my models with 
> Poisson and with quasiPoisson and the results are very similar. My 
> models look like this, with variations on the fixed effects:
> 
> rosite<-glmer(trees~wy+wy1+year+(1|site), family=poisson)
> 
> Many thanks,
> Lindsay
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Lindsay Reynolds
> Ph.D. Candidate
> Graduate Degree Program in Ecology
> Office location: Forestry 208
> Colorado State University
> Campus Delivery 1472
> Ft. Collins, CO 80523-1878
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


-- 
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc




More information about the R-sig-mixed-models mailing list