[R] GLM function with poisson distribution
Göran Broström
gb at tal.stat.umu.se
Wed Jan 26 10:20:47 CET 2005
On Wed, Jan 26, 2005 at 01:39:36AM +0100, Peter Dalgaard wrote:
> Thomas Lumley <tlumley at u.washington.edu> writes:
>
> > On Tue, 25 Jan 2005, Florian Menzel wrote:
> >
> > > Hello all,
> > > I found a weird result of the GLM function that seems
> > > to be a bug.
> >
> > No, the problem is that you are using the Wald test when the mle is
> > infinite, which is always going to be unreliable. It's even worse
> > because you are using data that couldn't really have come from a
> > Poisson distribution (because for a=1 you have mean 3 and variance 0).
>
> Actually, that's rather immaterial.
Right. For the Hauck[sic]-Donner effect. This two-sample problem admits a
two-dimensional sufficient statistic, the totals in the two groups. In this
example Xtot = 0, Ytot = 24. So the problem is reduced to: Given two
independent Poisson observations X = 0 and Y = 24, can they possibly come
from the same distribution?
Thomas' observation is relevant from a practical point of view; by
reducing data to sufficient statistics, we lose the possibility to check
model assumptions, here the iid (within groups) Poisson distribution.
> Try it with
>
> b=c(rep(0,8),rmultinom(1,24,rep(1/8,8)))
This is the conditional distribution, given the sufficient statistic.
> or even
>
> b=c(rep(0,8),rpois(8,3))
This is the unconditional distribution, given that the parameter is
equal to the mle.
> (in the former case, notice that the LRT is constant).
because the LRT statistic is a function of data only through the sufficient
statistic.
What is the "correct" p-value in this case? The exact p-value for the LRT
is hard to calculate (the distribution of the test statistic under the null
depends on thrue, common, unknown parameter), but
the conditional test, given X+Y, (has certain optimality properties) is
very easy to perform. The p-value is 2 * dbinom(0, 24, 0.5) = 1.192093e-07,
compared to the LRT 8.017e-09.
Who cares? But suppose the observed value of sufficient statistic was
(0, 5). Then the two p-values are 0.0625 and 0.00847, respectively, and one
should care.
Besides the binomial test, you can also use a bootstrap approach, by
simulating iid poisson pairs with mean (X+Y)/2 and calculate the LRT
statistic. For (X, Y) = (0, 5), this results in a p-value of about 0.018
(one million replicates). So, it seems as if the conditional test is too
conservative, and the asymptotics are far too optimistic. I _think_ this is
"common knowledge".
--
Göran Broström tel: +46 90 786 5223
Department of Statistics fax: +46 90 786 6614
Umeå University http://www.stat.umu.se/egna/gb/
SE-90187 Umeå, Sweden e-mail: gb at stat.umu.se
More information about the R-help
mailing list