[R-sig-ME] Poisson mixed models: Non-integer response variable in lmer?

Tue Mar 15 02:39:15 CET 2011

On 11-03-14 07:14 PM, Daniel Barton wrote:
> Hello,
>      Thanks to everyone who contributes to this list!  I often find random
> questions I have answered in the archives of this list.
> 
> My specific question of the moment, a simplified example of what I'm doing
> that I hope illustrates my question...
> 
>      If we have a poisson-distributed response variable in a mixed model
> such as called by:
> 
> lmer(amrotot ~ year + (year|route), family=poisson(link=log))
> 
>     where amrotot is an integer count, year is, well, the year (as a linear
> predictor, not a factor) and route is a sampling unit.  If 'exposure' varies
> by route, we can define another model with an offset such as:
> 
> lmer(amrotot ~ year + (year|route), offset=effort, family=poisson(link=log))
> 
>      this all seems, generally good and fine.  A colleague asked me why not
> use (amrotot/effort) as the response variable, but this of course results in
> a non-integer response variable.  Yet it turns out, lmer (or glm, for that
> matter) will indeed estimate a model using the non-integer response variable
> (amrotot/effort) but gives warnings.  I understand that poisson regression
> assumes a poisson-distributed integer response variable, but I was curious
> about *why* lmer would provide results for non-integer response variables
> such as (amrotot/effort) and if these results are valid or somehow
> comparable to results where amrotot is the response and effort is an offset,
> with special reference to the confidence intervals of the random effects.
> Using non-integer response variables in poisson regression looks and seems
> wrong to me, but IANA statistician and maybe lmer is doing something I don't
> quite get to make this work.

  It won't work: there's a reason that generalized models are restricted
to count data.  In particular, in Poisson models the assumption is that
the (expected) variance is equal to the (expected) mean for any data
point: if you can scale the data points, then the variance-to-mean
relationship will change with the units used, something you probably
don't want.

  e.g. if the sampling period is 1 hour and you have 1 count in the
sampling period, the mean and variance will both be 1 (unitless); if you
divide the counts by 60 to get counts per minute, then the variance will
be scaled to 1/3600 (counts/minute)^2 ....

You ask why glmer (and glm) lets you do this. It's generally difficult
to decide if one should prohibit, or just warn about, a practice that
seems odd. Sometimes there are indeed plausible scenarios (although to
be honest I can't think of one in this case ...) where someone wants to
use the software in a way not intended by the designers.  I won't say
that R is completely consistent in this regard, but overall the
philosophy of "you are assumed to know what you are doing, we will warn
you but not stop you if you seem to be doing something silly" is reasonable.

  Ben Bolker