[R] power and sample size for a GLM with Poisson response variable

Ken Beath kbeath at efs.mq.edu.au
Thu Mar 9 13:18:41 CET 2006

```The problem with the simulation is with the second group where there
is a high probability of obtaining all zeroes for the sample and this
is causing problems with the Wald statistics you are using to check
for a difference.  Here is an example.

Browse[1]> summary(res)

Call:
glm(formula = y[i, ] ~ group, family = poisson())

Deviance Residuals:
Min          1Q      Median          3Q         Max
-1.290e-01  -1.290e-01  -1.231e-05  -1.231e-05   2.756e+00

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)   -4.7884     0.3333 -14.365   <2e-16 ***
group2       -18.5142  1235.1829  -0.015    0.988
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 110.881  on 4260  degrees of freedom
Residual deviance:  86.192  on 4259  degrees of freedom
AIC: 108.19

Number of Fisher Scoring iterations: 21

On Wed, 8 Mar 2006 Wassell, James T., Ph.D. wrote:
> Subject: Re: [R] power and sample size for a GLM with Poisson response
> 	variable
> To: <r-help at stat.math.ethz.ch>
> Cc: "Craig A. Faulhaber" <caf at gis.usu.edu>
> Message-ID:
> 	<AF2DCD619279544BA454141F4A45B9E306A3E8 at m-niosh-3.niosh.cdc.gov>
> Content-Type: text/plain;	charset="US-ASCII"
>
> Craig, Thanks for your follow-up note on using the asypow package.  My
> problem was not only constructing the "constraints" vector but, for my
> particular situation (Poisson regression, two groups, sample sizes of
> (1081,3180), I get very different results using asypow package
> compared
> to my other (home grown) approaches.
>
> library(asypow)
> pois.mean<-c(0.0065,0.0003)
> info.pois <- info.poisson.kgroup(pois.mean, group.size=c(1081,3180))
> constraints <- matrix(c(2,1,2), ncol = 3, byrow=T)
> poisson.object <- asypow.noncent(pois.mean, info.pois,constraints)
> power.pois <- asypow.power(poisson.object, c(1081,3180), 0.05)
> print(power.pois)
>
> [1] 0.2438309
>
> asy.pwr()   #  the function is shown below.
>
> \$power
> [1] 0.96
>
> sim.pwr()   #  the function is shown below.
> 4261000 Poisson random variates simulated
> \$power
> [1] 0.567
>
> I tend to think the true power is greater than 0.567 but less than
> 0.96
> (not as small as 0.2438309).
>
> Maybe I am still not using the asypow functions correctly?
>
>
> -----Original Message-----
> From: Craig A. Faulhaber [mailto:caf at gis.usu.edu]
> Sent: Saturday, March 04, 2006 12:04 PM
> To: Wassell, James T., Ph.D.
> Subject: Re: [R] power and sample size for a GLM with Poisson response
> variable
>
> Hi James,
>
> Thanks again for your help.  With the assistance of a statistician
> colleauge of mine, I figured out the "constraints" vector.  It is a
> 3-column vector describing the null hypothesis.  For example, let's
> say
> you have 3 populations and your null hypothesis is that there is no
> difference between the 3.  The first row of the matrix would be "3
> 1 2"
> indicating that you have 3 populations and that populations 1 and 2
> are
> equal.  The second row would be "3 2 3" indicating that you have 3
> populations and that populations 2 and 3 are equal.  If you had only 2
> populations, there would be only one row ("2 1 2", indicating 2
> populations with 1 and 2 equal).  I hope this helps.
>
> Craig
>
> Wassell, James T., Ph.D. wrote:
>
>> Craig,   I found the package asypow difficult to use and it did not
>> yield results in the ballpark of other approaches.   (could not
>> figure
>
>> out the "constraints" vector).
>>
>> I wrote some simple functions, one asy.pwr uses the non-central
>> chi-square distn.
>>
>> asy.pwr<-function(counts=c(7,1),py=c(1081,3180))
>> { # a two group Poisson regression power computation #  py is
>> person-years or person-time however measured
>> group<-gl(2,1)
>> ncp<-summary(glm(counts~group+offset(log(py)),family=poisson))
>> \$null.de
>> viance
>>
>> q.tile<-qchisq(.95,1)  #  actually just the X2 critical value of
>> 3.841459 list(power=round(1-pchisq(q.tile,df=1,ncp),2))}
>>
>> The second function, sim.pwr,  estimates power using simulated
>> Poisson
>
>> random variates.  The most time consuming step is the call to rpois.
>
>> (Maybe someone knows a more efficient way to accomplish this?).  The
>> "for" loop is rather quick in comparison.   I hope you may find this
>> pass along your approach.    Note, that for this problem,  very small
>> values of lambda, the two approaches give  much different power
>> estimates  (96% vs. 55% or so).   My problem may be better addressed
>> as binomial logistic regression, maybe then the simulation and the
>> asymptotic estimates my agree better.
>>
>> sim.pwr<-function(means=c(0.0065,0.0003),ptime=c
>> (1081,3180),nsim=1000)
>> { # a two group poisson regression power computation # based
>> simulating lots of Poisson r.v.'s # input rates followed by a vector
>> of the corresponding person times # the most time consuming part is
>> the r.v. generation.
>> # power is determined by counting the how often p-values are <= 0.05
>> group<-as.factor(rep(c(1,2),ptime))
>> rej<-vector(length=nsim)
>> y<-rpois(ptime[1]*nsim,means[1])
>> y<-c(y,rpois(ptime[2]*nsim,means[2]))
>> y<-matrix(y,nrow=nsim)
>> cat(sum(ptime)*nsim,"Poisson random variates simulated","\n") for
>> (i in
>
>> 1:nsim){res<-glm(y[i,]~group,family=poisson())
>> rej[i]<-summary(res)\$coeff[2,4]<=0.05}
>> list(power=sum(rej)/nsim) }
>>
>>
>>
>>
>

```