[R-sig-ME] over-dispersed Poisson with lmer -- a trick, with a catch?

Thu Jul 16 18:01:15 CEST 2009

Hi all--

I'm currently working on analyses of longitudinal count data (number of 
drinking related problems over 2 years in a randomized trial for alcohol 
interventions).

The data are pretty clearly over-dispersed relative to Poisson, which is 
  clear when I fit the model using MCMCglmm, which includes a residual 
error term to account for over-dispersion (thank you, Jarrod!).

However, I'm collaborating with a colleague who uses Stata.  He 
mentioned that Sophia Rabe-Hesketh in her mixed-models book discusses 
fitting over-dispersed Poisson mixed-effects models using the following 
"trick":

Create a new variable with a unique value for each observation and 
include this as an additional random-effect term.

Basically, this will soak up any residual dispersion, over and above 
that accounted for in Poisson model and other random-effects.  On the 
face of it, it makes sense to me.  Though, we hit a snag in trying to do 
this in lmer().  Using the data that I have been analyzing:

### create new var with unique value for each obs/row
 > snap.df$over <- 1:nrow(snap.df)

### include as separate random-effect in model
 > rapi.glmer2.1 <- glmer(rapisum ~ asex*time + (time|ID) + (1|over),
+ 					data = snap.df, verbose = TRUE,
+ 					family = poisson)
Error in mer_finalize(ans) : q = 5252 > n = 3616

So, we already have a random intercept and slope for time [ie, 
(time|ID)].  With 818 participants the random-effects specify 818 
intercepts plus 818 slopes, and now with an observation level 
random-effect [ie, (1|over)] with 3616 observations... we get 5252 
estimates.  lmer() doesn't like it.

I know there's been a bit of chatter about this error before, and I 
believe Doug mentioned that it was to avoid people radically 
over-parameterizing/over-fitting their data via random-effects.  Though, 
I believe there is some room for debate (ie, the model above does not 
fit 5252 unique parameters).

Anyhow, I would be curious for any input on:

1. Does this seem like a sensible approach to account for 
over-dispersion?  (caveat: I don't have Rabe-Hesketh's book, so just 
going on recounting from colleague)

2. If so, I wonder if we might be able to coax Doug to change the error 
to an argument in the model (that defaults to the error msg but could be 
over-ridden)?

cheers, Dave

-- 
Dave Atkins, PhD
Research Associate Professor
Center for the Study of Health and Risk Behaviors
Department of  Psychiatry and Behavioral Science
University of Washington
1100 NE 45th Street, Suite 300
Seattle, WA  98105
206-616-3879
datkins at u.washington.edu