[R-sig-ME] Poisson mixed models: Non-integer response variable in lmer?
Ben Bolker
bbolker at gmail.com
Tue Mar 15 14:36:17 CET 2011
[cc'ing back to r-sig-mixed]
On 03/15/2011 01:37 AM, Daniel Barton wrote:
> Thanks for your response Ben. I spent some time playing some simulation
> games in R with vectors drawn from distributions with known parameters
> and then scaling them for giggles, but as you noted, upsetting the
> expected mean-variance relationship was clearly the important issue from
> the get-go (dividing a vector of poisson distributed rates makes them
> underdispersed!). Yet my friend/colleague made this very strange
> argument: if /scaling the data creates poisson-distributed rates/ from
> overdispersed count data, i.e. if we divide the original data with say a
> mean of 3 and variance of 30 by 8, then isn't this the right thing to
> do?
I get the general point, although I guess in this case you would want
to divide the original data by 10 (mean = 3/10 = var = 30/100) ?
This just seemed against all of my training (even though as I noted
> I'm not a statistician) because the point there is that the original
> count data is overdispersed, not that dividing it by some effort
> variable makes it seem poisson distributed (even though it's now rate
> data... odd). Is there some critical reference that I've missed that I
> could just point to that suggests /not /to engage in such strange practices?
>
I don't think there's a reference: welcome to the cutting edge ... I
agree (and was almost going to mention) that under other circumstances
(quasi-likelihood estimation), we do almost the equivalent of this
scaling in order to remove overdispersion. I don't think this will
necessarily work right (I haven't thought it all the way through) with
sampling periods of different lengths/sizes, though.
Ben Bolker
> Best,
> Dan Barton
>
> On Mon, Mar 14, 2011 at 6:39 PM, Ben Bolker <bbolker at gmail.com
> <mailto:bbolker at gmail.com>> wrote:
>
> On 11-03-14 07:14 PM, Daniel Barton wrote:
> > Hello,
> > Thanks to everyone who contributes to this list! I often
> find random
> > questions I have answered in the archives of this list.
> >
> > My specific question of the moment, a simplified example of what
> I'm doing
> > that I hope illustrates my question...
> >
> > If we have a poisson-distributed response variable in a mixed
> model
> > such as called by:
> >
> > lmer(amrotot ~ year + (year|route), family=poisson(link=log))
> >
> > where amrotot is an integer count, year is, well, the year (as
> a linear
> > predictor, not a factor) and route is a sampling unit. If
> 'exposure' varies
> > by route, we can define another model with an offset such as:
> >
> > lmer(amrotot ~ year + (year|route), offset=effort,
> family=poisson(link=log))
> >
> > this all seems, generally good and fine. A colleague asked
> me why not
> > use (amrotot/effort) as the response variable, but this of course
> results in
> > a non-integer response variable. Yet it turns out, lmer (or glm,
> for that
> > matter) will indeed estimate a model using the non-integer
> response variable
> > (amrotot/effort) but gives warnings. I understand that poisson
> regression
> > assumes a poisson-distributed integer response variable, but I was
> curious
> > about *why* lmer would provide results for non-integer response
> variables
> > such as (amrotot/effort) and if these results are valid or somehow
> > comparable to results where amrotot is the response and effort is
> an offset,
> > with special reference to the confidence intervals of the random
> effects.
> > Using non-integer response variables in poisson regression looks
> and seems
> > wrong to me, but IANA statistician and maybe lmer is doing
> something I don't
> > quite get to make this work.
>
> It won't work: there's a reason that generalized models are restricted
> to count data. In particular, in Poisson models the assumption is that
> the (expected) variance is equal to the (expected) mean for any data
> point: if you can scale the data points, then the variance-to-mean
> relationship will change with the units used, something you probably
> don't want.
>
> e.g. if the sampling period is 1 hour and you have 1 count in the
> sampling period, the mean and variance will both be 1 (unitless); if you
> divide the counts by 60 to get counts per minute, then the variance will
> be scaled to 1/3600 (counts/minute)^2 ....
>
> You ask why glmer (and glm) lets you do this. It's generally difficult
> to decide if one should prohibit, or just warn about, a practice that
> seems odd. Sometimes there are indeed plausible scenarios (although to
> be honest I can't think of one in this case ...) where someone wants to
> use the software in a way not intended by the designers. I won't say
> that R is completely consistent in this regard, but overall the
> philosophy of "you are assumed to know what you are doing, we will warn
> you but not stop you if you seem to be doing something silly" is
> reasonable.
>
> Ben Bolker
>
>
More information about the R-sig-mixed-models
mailing list