[R-sig-ME] Poisson mixed models: Non-integer response variable in lmer?

Ben Bolker bbolker at gmail.com
Tue Mar 15 14:36:17 CET 2011

  [cc'ing back to r-sig-mixed]

On 03/15/2011 01:37 AM, Daniel Barton wrote:
> Thanks for your response Ben.  I spent some time playing some simulation
> games in R with vectors drawn from distributions with known parameters
> and then scaling them for giggles, but as you noted, upsetting the
> expected mean-variance relationship was clearly the important issue from
> the get-go (dividing a vector of poisson distributed rates makes them
> underdispersed!).  Yet my friend/colleague made this very strange
> argument: if /scaling the data creates poisson-distributed rates/ from
> overdispersed count data, i.e. if we divide the original data with say a
> mean of 3 and variance of 30 by 8, then isn't this the right thing to
> do?  

  I get the general point, although I guess in this case you would want
to divide the original data by 10 (mean = 3/10 = var = 30/100) ?

This just seemed against all of my training (even though as I noted
> I'm not a statistician) because the point there is that the original
> count data is overdispersed, not that dividing it by some effort
> variable makes it seem poisson distributed (even though it's now rate
> data... odd).  Is there some critical reference that I've missed that I
> could just point to that suggests /not /to engage in such strange practices?

   I don't think there's a reference: welcome to the cutting edge ... I
agree (and was almost going to mention) that under other circumstances
(quasi-likelihood estimation), we do almost the equivalent of this
scaling in order to remove overdispersion.  I don't think this will
necessarily work right (I haven't thought it all the way through) with
sampling periods of different lengths/sizes, though.

   Ben Bolker

> Best,
> Dan Barton
> On Mon, Mar 14, 2011 at 6:39 PM, Ben Bolker <bbolker at gmail.com
> <mailto:bbolker at gmail.com>> wrote:
>     On 11-03-14 07:14 PM, Daniel Barton wrote:
>     > Hello,
>     >      Thanks to everyone who contributes to this list!  I often
>     find random
>     > questions I have answered in the archives of this list.
>     >
>     > My specific question of the moment, a simplified example of what
>     I'm doing
>     > that I hope illustrates my question...
>     >
>     >      If we have a poisson-distributed response variable in a mixed
>     model
>     > such as called by:
>     >
>     > lmer(amrotot ~ year + (year|route), family=poisson(link=log))
>     >
>     >     where amrotot is an integer count, year is, well, the year (as
>     a linear
>     > predictor, not a factor) and route is a sampling unit.  If
>     'exposure' varies
>     > by route, we can define another model with an offset such as:
>     >
>     > lmer(amrotot ~ year + (year|route), offset=effort,
>     family=poisson(link=log))
>     >
>     >      this all seems, generally good and fine.  A colleague asked
>     me why not
>     > use (amrotot/effort) as the response variable, but this of course
>     results in
>     > a non-integer response variable.  Yet it turns out, lmer (or glm,
>     for that
>     > matter) will indeed estimate a model using the non-integer
>     response variable
>     > (amrotot/effort) but gives warnings.  I understand that poisson
>     regression
>     > assumes a poisson-distributed integer response variable, but I was
>     curious
>     > about *why* lmer would provide results for non-integer response
>     variables
>     > such as (amrotot/effort) and if these results are valid or somehow
>     > comparable to results where amrotot is the response and effort is
>     an offset,
>     > with special reference to the confidence intervals of the random
>     effects.
>     > Using non-integer response variables in poisson regression looks
>     and seems
>     > wrong to me, but IANA statistician and maybe lmer is doing
>     something I don't
>     > quite get to make this work.
>      It won't work: there's a reason that generalized models are restricted
>     to count data.  In particular, in Poisson models the assumption is that
>     the (expected) variance is equal to the (expected) mean for any data
>     point: if you can scale the data points, then the variance-to-mean
>     relationship will change with the units used, something you probably
>     don't want.
>      e.g. if the sampling period is 1 hour and you have 1 count in the
>     sampling period, the mean and variance will both be 1 (unitless); if you
>     divide the counts by 60 to get counts per minute, then the variance will
>     be scaled to 1/3600 (counts/minute)^2 ....
>     You ask why glmer (and glm) lets you do this. It's generally difficult
>     to decide if one should prohibit, or just warn about, a practice that
>     seems odd. Sometimes there are indeed plausible scenarios (although to
>     be honest I can't think of one in this case ...) where someone wants to
>     use the software in a way not intended by the designers.  I won't say
>     that R is completely consistent in this regard, but overall the
>     philosophy of "you are assumed to know what you are doing, we will warn
>     you but not stop you if you seem to be doing something silly" is
>     reasonable.
>      Ben Bolker

More information about the R-sig-mixed-models mailing list