[R] GLM function with poisson distribution

Wed Jan 26 10:20:47 CET 2005

On Wed, Jan 26, 2005 at 01:39:36AM +0100, Peter Dalgaard wrote:
> Thomas Lumley <tlumley at u.washington.edu> writes:
> 
> > On Tue, 25 Jan 2005, Florian Menzel wrote:
> > 
> > > Hello all,
> > > I found a weird result of the GLM function that seems
> > > to be a bug.
> > 
> > No, the problem is that you are using the Wald test when the mle is
> > infinite, which is always going to be unreliable.  It's even worse
> > because you are using data that couldn't really have come from a
> > Poisson distribution (because for a=1 you have mean 3 and variance 0).
> 
> Actually, that's rather immaterial. 

Right. For the Hauck[sic]-Donner effect. This two-sample problem admits a
two-dimensional sufficient statistic, the totals in the two groups. In this
example Xtot = 0, Ytot = 24. So the problem is reduced to: Given two
independent Poisson observations X = 0 and Y = 24, can they possibly come
from the same distribution?

Thomas' observation is relevant from a practical point of view; by
reducing data to sufficient statistics, we lose the possibility to check
model assumptions, here the iid (within groups) Poisson distribution.

> Try it with
> 
> b=c(rep(0,8),rmultinom(1,24,rep(1/8,8))) 

This is the conditional distribution, given the sufficient statistic.

> or even
> 
> b=c(rep(0,8),rpois(8,3))

This is the unconditional distribution, given that the parameter is
equal to the mle.

> (in the former case, notice that the LRT is constant).

because the LRT statistic is a function of data only through the sufficient 
statistic.

What is the "correct" p-value in this case? The exact p-value for the LRT
is hard to calculate (the distribution of the test statistic under the null
depends on thrue, common, unknown  parameter), but
the conditional test, given X+Y, (has certain optimality properties) is
very easy to perform. The p-value is 2 * dbinom(0, 24, 0.5) = 1.192093e-07,
compared to the LRT  8.017e-09. 

Who cares? But suppose the observed value of sufficient statistic was 
(0, 5). Then the two p-values are 0.0625 and 0.00847, respectively, and one
should care. 

Besides the binomial test, you can also use a bootstrap approach, by
simulating iid poisson pairs with mean (X+Y)/2 and calculate the LRT
statistic. For (X, Y) = (0, 5), this results in a p-value of about 0.018
(one million replicates). So, it seems as if the conditional test is too
conservative, and the asymptotics are far too optimistic. I _think_ this is
"common knowledge".  

-- 
 Göran Broström                    tel: +46 90 786 5223
 Department of Statistics          fax: +46 90 786 6614
 Umeå University                   http://www.stat.umu.se/egna/gb/
 SE-90187 Umeå, Sweden             e-mail: gb at stat.umu.se