[R-sig-ME] Dealing with Overdispersion in Count Data with Mixed Modeling
Ben Bolker
bbolker at gmail.com
Tue Nov 23 22:41:07 CET 2010
Quick answer: this is most often a symptom of overfitted models and/or
strongly correlated predictors. Your intercepts and slopes are likely
to be strongly correlated, as are linear and quadratic terms, if the
means of your predictors are far from zero.
A few quick things to try:
* center (and possibly scale) your continuous predictors
* use poly(covx,2) rather than (1+covx+I(covx^2) [I think this
should work for both fixed and random effects]
* if covx and covz (or their effects) are likely to be correlated,
try substituting (1|LT)+(0+covx|LT)+(0+covz|LT) for the (1+covx+covz|LT)
term (this is a different, simpler, model but one that is likely to be
easier to fit -- if it works you can worry about returning to the more
complex model later).
* if desperate/willing to invest quite a bit more time, try
simulating data from the model under a best-case scenario and see if you
can get it to work (and give the right answer) under those conditions.
good luck
Ben Bolker
On 11/23/2010 04:23 PM, David Stainbrook wrote:
> Ben,
>
> I have updated both the version or R and the lme4 library, but receive
> error messages and no summary output for models with more than one
> covariate.
> [R version 2.12.0 (2010-10-15) and lme4_0.999375-37]
>
>
> The two simple models I described before worked, but when I add another
> covariate or a squared term, it does not and I get warning messages.
>
> These two worked:
> model_1<-lmer(Y~1 + (1|LT) + (1|Observation), data=df3, family=poisson)
> model_2<-lmer(Y~1 + (1+covx|LT) + (1|Observation) + covx, data=df3,
> family=poisson)
>
> I get the following message: "Number of levels of a grouping factor for
> the random effects is *equal* to n, the number of observations." I don't
> get any warning messages and I get a summary output.
>
>
>
> These fails to work:
> model_3<-lmer(Y~1 + (1+covx + I(covx^2)|LT) + (1|Observation) + covx +
> I(covx^2), data=df3, family=poisson)
> model_4<-lmer(Y~1 + (1+covx + covz |LT) + (1|Observation) + covx + covz,
> data=df3, family=poisson)
>
> I get the same message but now with warning messages and no summary output:
> "Number of levels of a grouping factor for the random effects
> is *equal* to n, the number of observations
> Warning messages:
> 1: In mer_finalize(ans) :
> Cholmod warning 'not positive definite' at
> file:../Cholesky/t_cholmod_rowfac.c, line 432
> 2: In mer_finalize(ans) :
> Cholmod warning 'not positive definite' at
> file:../Cholesky/t_cholmod_rowfac.c, line 432
> 3: In mer_finalize(ans) :
> Cholmod warning 'not positive definite' at
> file:../Cholesky/t_cholmod_rowfac.c, line 432
>
>> summary(model_3)
> Error in asMethod(object) : matrix is not symmetric [1,2]"
>
> Do you know why this is and do you know how I can get a summary output
> for more complex models?
>
> Thanks,
> David
>
>
>
>
>
>
>
>
> On Thu, Nov 11, 2010 02:43 PM, *Ben Bolker <bbolker at gmail.com>* wrote:
>
> On 11/11/2010 02:11 PM, David Stainbrook wrote:
> > Ben,
> >
> > Putting aside the issues of model convergence, I would like to focus on
> > the issue of how to deal with overdispersion in count data. Typically, a
> > negative binomial or quasi-poisson model deals with overdispersion in
> > situations where it is not appropriate to use Poisson. However, to my
> > knowledge, lme4 does not allow multiple random effects using the
> > negative binomial family
>
> [correction: it doesn't implement negative binomial models at all,
> although I believe that *in* principle this could be added by using an
> additional 'k' parameter that controlled the mean-variance relationship,
> in a way analogous to glm.nb() in the MASS package].
>
> and quasi-poisson has issues with the
> > likelihood estimation. As you suggested in a previous email, and from
> > the article at
> >
> <http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=82701
> > <http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=82701>,
> > translating the model to a lognormal-Poisson model by using an
> > individual-level random variable and using family=poisson should deal
> > with the overdispersion.
>
> Yes.
>
> >
> > I wanted to clarify what you meant by the individual level to make sure
> > that I am doing it correctly. I already have a random intercept for each
> > individual (1|LT), where LT=lowest tag number for each individual, to
> > account for pseudoreplication. For the individual-level random variable
> > to deal with overdispersion, did you mean adding (1|Observation), where
> > Observation is a vector from 1 to the total number of observations or
> > records in the dataset?
>
> yes, exactly.
>
> >
> > This is demonstrated here in a very simple model with no covariates:
> > model_1<-lmer(Y~1 + (1|LT) + (1|Observation), data=df3, family=poisson)
> >
> > and here with an additional covariate (covx) with both fixed and random
> > effects:
> > model_2<-lmer(Y~1 + (1+covx|LT) + (1|Observation) + covx, data=df3,
> > family=poisson)
> >
> > Is this what you meant and is it coded correctly?
> >
> > I tried this code with an older version of R and lme4, and it yielded an
> > error and no summary data, but after updating both the version of R and
> > the lme4 library, I received a warning "Number of levels of a grouping
> > factor for the random effects is *equal* to n, the number of
> > observations", but it yielded summary information that I took to be
> > legitimate.
>
> That is what I would expect.
>
> Ben Bolker
>
>
> >
> > Thanks for the help,
> >
> > ..................................................................
> >
> > David Stainbrook
> > M.S. Graduate Research Assistant
> > Pennsylvania Cooperative Fish & Wildlife Research Unit
> > The Pennsylvania State University
> >
> > ..................................................................
> >
> >
> > On Mon, Nov 1, 2010 09:53 PM, *Ben Bolker <bbolker at gmail.com>* wrote:
> >
> > [cc'ing r-sig-mixed-models: it's best to keep sending replies back to
> > the list so they can be archived and others can read them, or offer input]
> >
> > On 10-11-01 01:28 PM, David Stainbrook wrote:
> > > Ben,
> > >
> > > Thanks for your input. I read that article that you suggested and it
> > > appears they used SAS and Genstat to do their analysis.
> >
> > Yes (although as I said at the time, I wouldn't actually trust the
> > methods that they used in Genstat for this problem. I just think their
> > description of the problem is clear).
> >
> > > Is it possible
> > > to use the Poisson-lognormal model in R or translate the model to this
> > > using R and lmer? Another professor mentioned that I may be able to get
> > > it to work using a negative binomial model in SAS or ADModel Builder.
> > > What do you suggest?
> >
> > Yes, you can use the Poisson-lognormal in recent versions of lme4,
> > simply by including an individual-level random variable. You may get
> > warnings.
> > You could indeed use a negative binomial model in SAS or AD Model
> > Builder (in ADMB you could also use the lognormal-Poisson model).
> >
> > > Do you have any idea why Doug allowed the lmer function to fit
> > > quasipoisson if he doesn't feel that the results will be reliable? I
> > > would have trusted my results and wouldn't have had any idea that they
> > > might have been unreliable if he had not said that.
> >
> > I believe he implemented it a while ago and his opinions have now
> > changed. (I agree that it might be a good idea to disable this
> > functionality.)
> >
> > > Also, do you have any idea how to increase both the default number of
> > > function evaluations and iterations with the control statement within
> > > the lmer model statement?
> >
> > ...,control=list(maxIter=2000,maxFN=3000),... should work.
> >
> > * you seem to be tackling a difficult problem. I appreciate that
> > you're offering full details on your problem (full scripts and data),
> > but it's going to take someone else at least half an hour (and probably
> > quite a bit more) to get up to speed on what you're doing and what's not
> > working; unfortunately, that's more than most anyone has time for,
> > unless the problem happens to be something very close to their
> > interests. Unfortunately, you may well need to find local help for this
> > (your advisor? a friendly stats professor or graduate student?) - <I already exhausted those options, hence contacting you and Doug>
> > * it's possible, depending on the complexity of your model, that
> > you're simply trying to fit too complicated a model. You do have a lot
> > of data points, but some of your covariates may be strongly correlated.
> > Have you tried:
> > - seeing if you can successfully fit a subset of the data points
> > (this could be faster, allowing you to debug quicker)?
> > - seeing if you can successfully fit a subset of the covariates, or
> > which covariates or combinations of covariates are problematic?
> > - seeing if you can successfully fit a non-mixed (GLM) model,
> > treating 'individual' as a fixed effect?
> > - simulating data, possibly in a simplified form, to see if you can
> > get the right answer when you know what it is?
> > * lme4 is quite finicky about convergence, on the philosophy that it's
> > better not to give an answer than to give a wrong one.
> >
> > R does have its advantages, but if you're up to working with SAS or AD
> > Model Builder I would recommend you also try those approaches -- see if
> > you run into the same problems. But I would definitely try some of the
> > trouble-shooting strategies above, first.
> >
> > good luck,
> > Ben Bolker
> >
> >
> > > Thanks again,
> > >
> > > David
> >
> > >
> > > On Thu, Oct 28, 2010 04:49 PM, *Ben Bolker <bbolker at gmail.com>*
> > wrote:
> > >
> > > My advice would be to use an individual-level random variable
> > > (translating to a lognormal-Poisson model, which is qualitatively
> > > similar to a negative binomial) -- see e.g. Elston et al 2001 for
> > a
> > > decent explanation, although you should not necessarily trust the
> > > numeric methods they use ...
> > >
> > > [Elston, D. A., R. Moss, T. Boulinier, C. Arrowsmith, and X. Lambin.
> > > 2001. Analysis of Aggregation, a Worked Example: Numbers of Ticks on
> > Red
> > > Grouse Chicks. Parasitology 122, no. 05: 563-569.
> > > doi:10.1017/S0031182001007740.
> > >
> > http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=82701.]
> > >
> > >
> > > (Doug, thanks for the vote of confidence!)
> > >
> > > cheers
> > > Ben
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
>
More information about the R-sig-mixed-models
mailing list