[R-sig-ME] Dealing with Overdispersion in Count Data with Mixed Modeling

Tue Nov 23 22:41:07 CET 2010

  Quick answer: this is most often a symptom of overfitted models and/or
strongly correlated predictors.  Your intercepts and slopes are likely
to be strongly correlated, as are linear and quadratic terms, if the
means of your predictors are far from zero.

  A few quick things to try:
    * center (and possibly scale) your continuous predictors
    * use poly(covx,2) rather than (1+covx+I(covx^2) [I think this
should work for both fixed and random effects]
    * if covx and covz (or their effects) are likely to be correlated,
try substituting (1|LT)+(0+covx|LT)+(0+covz|LT) for the (1+covx+covz|LT)
term (this is a different, simpler, model but one that is likely to be
easier to fit -- if it works you can worry about returning to the more
complex model later).
    * if desperate/willing to invest quite a bit more time, try
simulating data from the model under a best-case scenario and see if you
can get it to work (and give the right answer) under those conditions.

  good luck
    Ben Bolker

On 11/23/2010 04:23 PM, David Stainbrook wrote:
> Ben,
> 
> I have updated both the version or R and the lme4 library, but receive
> error messages and no summary output for models with more than one
> covariate.
> [R version 2.12.0 (2010-10-15) and lme4_0.999375-37] 
> 
> 
> The two simple models I described before worked, but when I add another
> covariate or a squared term, it does not and I get warning messages.
> 
> These two worked:
> model_1<-lmer(Y~1 + (1|LT) + (1|Observation), data=df3, family=poisson)
> model_2<-lmer(Y~1 + (1+covx|LT) + (1|Observation) + covx, data=df3,
> family=poisson)
> 
> I get the following message: "Number of levels of a grouping factor for
> the random effects is *equal* to n, the number of observations." I don't
> get any warning messages and I get a summary output.
> 
> 
> 
> These fails to work:
> model_3<-lmer(Y~1 + (1+covx + I(covx^2)|LT) + (1|Observation) + covx +
> I(covx^2), data=df3, family=poisson)
> model_4<-lmer(Y~1 + (1+covx + covz |LT) + (1|Observation) + covx + covz,
> data=df3, family=poisson)
> 
> I get the same message but now with warning messages and no summary output:
> "Number of levels of a grouping factor for the random effects
> is *equal* to n, the number of observations
> Warning messages:
> 1: In mer_finalize(ans) :
>   Cholmod warning 'not positive definite' at
> file:../Cholesky/t_cholmod_rowfac.c, line 432
> 2: In mer_finalize(ans) :
>   Cholmod warning 'not positive definite' at
> file:../Cholesky/t_cholmod_rowfac.c, line 432
> 3: In mer_finalize(ans) :
>   Cholmod warning 'not positive definite' at
> file:../Cholesky/t_cholmod_rowfac.c, line 432
> 
>> summary(model_3)
> Error in asMethod(object) : matrix is not symmetric [1,2]"
> 
> Do you know why this is and do you know how I can get a summary output
> for more complex models?
> 
> Thanks,
> David
> 
> 
> 
> 
> 
> 
> 
> 
> On Thu, Nov 11, 2010 02:43 PM, *Ben Bolker <bbolker at gmail.com>* wrote:
> 
>     On 11/11/2010 02:11 PM, David Stainbrook wrote:
>     > Ben,
>     > 
>     > Putting aside the issues of model convergence, I would like to focus on
>     > the issue of how to deal with overdispersion in count data. Typically, a
>     > negative binomial or quasi-poisson model deals with overdispersion in
>     > situations where it is not appropriate to use Poisson. However, to my
>     > knowledge, lme4 does not allow multiple random effects using the
>     > negative binomial family 
> 
>       [correction: it doesn't implement negative binomial models at all,
>     although I believe that *in* principle this could be added by using an
>     additional 'k' parameter that controlled the mean-variance relationship,
>     in a way analogous to glm.nb() in the MASS package].
> 
>     and quasi-poisson has issues with the
>     > likelihood estimation. As you suggested in a previous email, and from
>     > the article at
>     >
>     <http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=82701
>     > <http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=82701>,
>     > translating the model to a lognormal-Poisson model by using an
>     > individual-level random variable and using family=poisson should deal
>     > with the overdispersion.
> 
>       Yes.
> 
>     > 
>     > I wanted to clarify what you meant by the individual level to make sure
>     > that I am doing it correctly. I already have a random intercept for each
>     > individual (1|LT), where LT=lowest tag number for each individual, to
>     > account for pseudoreplication. For the individual-level random variable
>     > to deal with overdispersion, did you mean adding (1|Observation), where
>     > Observation is a vector from 1 to the total number of observations or
>     > records in the dataset?
> 
>       yes, exactly.
> 
>     > 
>     > This is demonstrated here in a very simple model with no covariates:
>     > model_1<-lmer(Y~1 + (1|LT) + (1|Observation), data=df3, family=poisson)
>     > 
>     > and here with an additional covariate (covx) with both fixed and random
>     > effects:
>     > model_2<-lmer(Y~1 + (1+covx|LT) + (1|Observation) + covx, data=df3, 
>     > family=poisson)
>     > 
>     > Is this what you meant and is it coded correctly?
>     > 
>     > I tried this code with an older version of R and lme4, and it yielded an
>     > error and no summary data, but after updating both the version of R and
>     > the lme4 library, I received a warning "Number of levels of a grouping
>     > factor for the random effects is *equal* to n, the number of
>     > observations", but it yielded summary information that I took to be
>     > legitimate.
> 
>        That is what I would expect.
> 
>       Ben Bolker
> 
> 
>     > 
>     > Thanks for the help,
>     > 
>     > ..................................................................
>     > 
>     > David Stainbrook
>     > M.S. Graduate Research Assistant
>     > Pennsylvania Cooperative Fish & Wildlife Research Unit
>     > The Pennsylvania State University
>     > 
>     > ..................................................................
>     > 
>     > 
>     > On Mon, Nov 1, 2010 09:53 PM, *Ben Bolker <bbolker at gmail.com>* wrote:
>     > 
>     >       [cc'ing r-sig-mixed-models: it's best to keep sending replies back to
>     >     the list so they can be archived and others can read them, or offer input]
>     > 
>     >     On 10-11-01 01:28 PM, David Stainbrook wrote:
>     >     > Ben,
>     >     > 
>     >     > Thanks for your input. I read that article that you suggested and it
>     >     > appears they used SAS and Genstat to do their analysis. 
>     > 
>     >       Yes (although as I said at the time, I wouldn't actually trust the
>     >     methods that they used in Genstat for this problem.  I just think their
>     >     description of the problem is clear).
>     > 
>     >     > Is it possible
>     >     > to use the Poisson-lognormal model in R or translate the model to this
>     >     > using R and lmer? Another professor mentioned that I may be able to get
>     >     > it to work using a negative binomial model in SAS or ADModel Builder.
>     >     > What do you suggest?
>     > 
>     >       Yes, you can use the Poisson-lognormal in recent versions of lme4,
>     >     simply by including an individual-level random variable.  You may get
>     >     warnings.
>     >       You could indeed use a negative binomial model in SAS or AD Model
>     >     Builder (in ADMB you could also use the lognormal-Poisson model).
>     > 
>     >     > Do you have any idea why Doug allowed the lmer function to fit
>     >     > quasipoisson if he doesn't feel that the results will be reliable? I
>     >     > would have trusted my results and wouldn't have had any idea that they
>     >     > might have been unreliable if he had not said that.
>     > 
>     >       I believe he implemented it a while ago and his opinions have now
>     >     changed. (I agree that it might be a good idea to disable this
>     >     functionality.)
>     > 
>     >     > Also, do you have any idea how to increase both the default number of
>     >     > function evaluations and iterations with the control statement within
>     >     > the lmer model statement?
>     > 
>     >     ...,control=list(maxIter=2000,maxFN=3000),... should work.
>     > 
>     >        * you seem to be tackling a difficult problem.  I appreciate that
>     >     you're offering full details on your problem (full scripts and data),
>     >     but it's going to take someone else at least half an hour (and probably
>     >     quite a bit more) to get up to speed on what you're doing and what's not
>     >     working; unfortunately, that's more than most anyone has time for,
>     >     unless the problem happens to be something very close to their
>     >     interests. Unfortunately, you may well need to find local help for this
>     >     (your advisor? a friendly stats professor or graduate student?) - <I already exhausted those options, hence contacting you and Doug>
>     >       * it's possible, depending on the complexity of your model, that
>     >     you're simply trying to fit too complicated a model.  You do have a lot
>     >     of data points, but some of your covariates may be strongly correlated.
>     >     Have you tried:
>     >        - seeing if you can successfully fit a subset of the data points
>     >     (this could be faster, allowing you to debug quicker)?
>     >        - seeing if you can successfully fit a subset of the covariates, or
>     >     which covariates or combinations of covariates are problematic?
>     >        - seeing if you can successfully fit a non-mixed (GLM) model,
>     >     treating 'individual' as a fixed effect?
>     >        - simulating data, possibly in a simplified form, to see if you can
>     >     get the right answer when you know what it is?
>     >       * lme4 is quite finicky about convergence, on the philosophy that it's
>     >     better not to give an answer than to give a wrong one.
>     > 
>     >       R does have its advantages, but if you're up to working with SAS or AD
>     >     Model Builder I would recommend you also try those approaches -- see if
>     >     you run into the same problems.  But I would definitely try some of the
>     >     trouble-shooting strategies above, first.
>     > 
>     >       good luck,
>     >         Ben Bolker
>     > 
>     > 
>     >     > Thanks again,
>     >     > 
>     >     > David
>     > 
>     >     > 
>     >     > On Thu, Oct 28, 2010 04:49 PM, *Ben Bolker <bbolker at gmail.com>*
>     >     wrote:
>     >     > 
>     >     >        My advice would be to use an individual-level random variable
>     >     >     (translating to a lognormal-Poisson model, which is qualitatively
>     >     >     similar to a negative binomial) -- see e.g. Elston et al 2001 for
>     >     a
>     >     >     decent explanation, although you should not necessarily trust the
>     >     >     numeric methods they use ...
>     >     > 
>     >     >      [Elston, D. A., R. Moss, T. Boulinier, C. Arrowsmith, and X. Lambin.
>     >     >     2001. Analysis of Aggregation, a Worked Example: Numbers of Ticks on
>     >     Red
>     >     >     Grouse Chicks. Parasitology 122, no. 05: 563-569.
>     >     >     doi:10.1017/S0031182001007740.
>     >     >    
>     >     http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=82701.]
>     >     > 
>     >     > 
>     >     >       (Doug, thanks for the vote of confidence!)
>     >     > 
>     >     >       cheers
>     >     >         Ben
>     >     > 
>     >     > 
>     >     > 
>     >     > 
>     >     > 
>     > 
>     > 
>     > 
>     > 
> 
> 
> 
>