[R-sig-ME] Dealing with Overdispersion in Count Data with Mixed Modeling

Thu Nov 11 20:43:04 CET 2010

On 11/11/2010 02:11 PM, David Stainbrook wrote:
> Ben,
> 
> Putting aside the issues of model convergence, I would like to focus on
> the issue of how to deal with overdispersion in count data. Typically, a
> negative binomial or quasi-poisson model deals with overdispersion in
> situations where it is not appropriate to use Poisson. However, to my
> knowledge, lme4 does not allow multiple random effects using the
> negative binomial family 

  [correction: it doesn't implement negative binomial models at all,
although I believe that *in* principle this could be added by using an
additional 'k' parameter that controlled the mean-variance relationship,
in a way analogous to glm.nb() in the MASS package].

and quasi-poisson has issues with the
> likelihood estimation. As you suggested in a previous email, and from
> the article at
> <http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=82701
> <http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=82701>>,
> translating the model to a lognormal-Poisson model by using an
> individual-level random variable and using family=poisson should deal
> with the overdispersion.

  Yes.

> 
> I wanted to clarify what you meant by the individual level to make sure
> that I am doing it correctly. I already have a random intercept for each
> individual (1|LT), where LT=lowest tag number for each individual, to
> account for pseudoreplication. For the individual-level random variable
> to deal with overdispersion, did you mean adding (1|Observation), where
> Observation is a vector from 1 to the total number of observations or
> records in the dataset?

  yes, exactly.

> 
> This is demonstrated here in a very simple model with no covariates:
> model_1<-lmer(Y~1 + (1|LT) + (1|Observation), data=df3, ! family=poisson)
> 
> and here with an additional covariate (covx) with both fixed and random
> effects:
> model_2<-lmer(Y~1 + (1+covx|LT) + (1|Observation) + covx, data=df3, 
> family=poisson)
> 
> Is this what you meant and is it coded correctly?
> 
> I tried this code with an older version of R and lme4, and it yielded an
> error and no summary data, but after updating both the version of R and
> the lme4 library, I received a warning "Number of levels of a grouping
> factor for the random effects is *equal* to n, the number of
> observations", but it yielded summary information that I took to be
> legitimate.

   That is what I would expect.

  Ben Bolker

> 
> Thanks for the help,
> 
> ..................................................................
> 
> David Stainbrook
> M.S. Graduate Research Assistant
> Pennsylvania Cooperative Fish & Wildlife Research Unit
> The Pennsylvania State University
> 
> ..................................................................
> 
> 
> On Mon, Nov 1, 2010 09:53 PM, *Ben Bolker <bbolker at gmail.com>* wrote:
> 
>       [cc'ing r-sig-mixed-models: it's best to keep sending replies back to
>     the list so they can be archived and others can read them, or offer input]
> 
>     On 10-11-01 01:28 PM, David Stainbrook wrote:
>     > Ben,
>     > 
>     > Thanks for your input. I read that article that you suggested and it
>     > appears they used SAS and Genstat to do their analysis. 
> 
>       Yes (although as I said at the time, I wouldn't actually trust the
>     methods that they used in Genstat for this problem.  I just think their
>     description of the problem is clear).
> 
>     > Is it possible
>     > to use the Poisson-lognormal model in R or translate the model to this
>     > using R and lmer? Another professor mentioned that I may be able to get
>     > it to work using a negative binomial model in SAS or ADModel Builder.
>     > What do you suggest?
> 
>       Yes, you can use the Poisson-lognormal in recent versions of lme4,
>     simply by including an individual-level random variable.  You may get
>     warnings.
>       You could indeed use a negative binomial model in SAS or AD Model
>     Builder (in ADMB you could also use the lognormal-Poisson model).
> 
>     > Do you have any idea why Doug allowed the lmer function to fit
>     > quasipoisson if he doesn't feel that the results will be reliable? I
>     > would have trusted my results and wouldn't have had any idea that they
>     > might have been unreliable if he had not said that.
> 
>       I believe he implemented it a while ago and his opinions have now
>     changed. (I agree that it might be a good idea to disable this
>     functionality.)
> 
>     > Also, do you have any idea how to increase both the default number of
>     > function evaluations and iterations with the control statement within
>     > the lmer model statement?
> 
>     ...,control=list(maxIter=2000,maxFN=3000),... should work.
> 
>        * you seem to be tackling a difficult problem.  I appreciate that
>     you're offering full details on your problem (full scripts and data),
>     but it's going to take someone else at least half an hour (and probably
>     quite a bit more) to get up to speed on what you're doing and what's not
>     working; unfortunately, that's more than most anyone has time for,
>     unless the problem happens to be something very close to their
>     interests. Unfortunately, you may well need to find local help for this
>     (your advisor? a friendly stats professor or graduate student?) - <I already exhausted those options, hence contacting you and Doug>
>       * it's possible, depending on the complexity of your model, that
>     you're simply trying to fit too complicated a model.  You do have a lot
>     of data points, but some of your covariates may be strongly correlated.
>     Have you tried:
>        - seeing if you can successfully fit a subset of the data points
>     (this could be faster, allowing you to debug quicker)?
>        - seeing if you can successfully fit a subset of the covariates, or
>     which covariates or combinations of covariates are problematic?
>        - seeing if you can successfully fit a non-mixed (GLM) model,
>     treating 'individual' as a fixed effect?
>        - simulating data, possibly in a simplified form, to see if you can
>     get the right answer when you know what it is?
>       * lme4 is quite finicky about convergence, on the philosophy that it's
>     better not to give an answer than to give a wrong one.
> 
>       R does have its advantages, but if you're up to working with SAS or AD
>     Model Builder I would recommend you also try those approaches -- see if
>     you run into the same problems.  But I would definitely try some of the
>     trouble-shooting strategies above, first.
> 
>       good luck,
>         Ben Bolker
> 
> 
>     > Thanks again,
>     > 
>     > David
> 
>     > 
>     > On Thu, Oct 28, 2010 04:49 PM, *Ben Bolker <bbolker at gmail.com>*
>     wrote:
>     > 
>     >        My advice would be to use an individual-level random variable
>     >     (translating to a lognormal-Poisson model, which is qualitatively
>     >     similar to a negative binomial) -- see e.g. Elston et al 2001 for
>     a
>     >     decent explanation, although you should not necessarily trust the
>     >     numeric methods they use ...
>     > 
>     >      [Elston, D. A., R. Moss, T. Boulinier, C. Arrowsmith, and X. Lambin.
>     >     2001. Analysis of Aggregation, a Worked Example: Numbers of Ticks on
>     Red
>     >     Grouse Chicks. Parasitology 122, no. 05: 563-569.
>     >     doi:10.1017/S0031182001007740.
>     >    
>     http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=82701.]
>     > 
>     > 
>     >       (Doug, thanks for the vote of confidence!)
>     > 
>     >       cheers
>     >         Ben
>     > 
>     > 
>     > 
>     > 
>     > 
> 
> 
> 
>