[R-sig-ME] Observation-level random effect to model overdispersion
Ben Bolker
bbolker at gmail.com
Mon Mar 21 13:12:22 CET 2011
On 11-03-21 07:51 AM, M.S.Muller wrote:
> Dear all,
>
> I'm trying to analyze some strongly overdispersed Poisson-distributed
> data using R's mixed effects model function "lmer". Recently, several
> people have suggested incorporating an observation-level random
> effect, which would model the excess variation and solve the problem
> of underestimated standard errors that arises with overdispersed
> data. It seems to be working, but I feel uneasy using this method
> because I don't actually understand conceptually what it is doing.
> Does it package up the extra, non-Poisson variation into a miniature
> variance component for each data point? But then I don't understand
> how one ends up with non-zero residuals and why one can't just do
> this for any analyses (even with normally-distributed data) in which
> one would like to reduce noise.
>
> I may be way off base here, but does this approach model some kind of
> mixture distribution that's a combination of Poisson and whatever
> distribution the extra variation is? I've read that people often use
> a negative binomial distribution (aka Poisson-gamma) to model
> overdispersed count data in which they assume that the process is
> Poisson (so they use a log link) but the extra variation is a gamma
> distribution (in which variance is proportional to square of the
> mean). The frequently referred to paper by Elston et al (2001)
> describes modeling a Poisson-lognormal distribution in which
> overdispersion arises from errors taking on a lognormal distribution.
> Is the approach of using the observation-level random effect doing
> something similar, and simply assuming some kind of Poisson-normal
> mixed distribution? Does this approach therefore assume that the
> observation-level variance is normally distributed?
Exactly.
The observation-level random effect approach is equivalent to assuming
that the individual observations are [x]-normal distributed, i.e. a
compound of a normal distribution transformed by the inverse link
function and the specified distribution family.
Sorry that's a bit clunky, but it translates to what you said above --
* lognormal-Poisson for Poisson with log link;
* logit-normal-binomial for binomial with logit link;
etc.
For what it's worth, the Elston paper is philosophically sensible but
I'm not sure that it's computationally sound; as I have said before
<https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q2/003967.html>,
using PQL with observation-level random effects is explicitly
*dis*recommended in the Genstat documentation; I had convergence
problems fitting the data in lme4, and MCMCglmm told me that the data
were under-specified and I should consider a more informative prior ...
(I see Jarrod Hadfield has just answered this question too.)
>
> If anyone could give me any guidance on this, I would appreciate it
> very much.
>
> Martina Muller
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list