[R-sig-ME] inflated standard error for zero counts in Poisson lmer

Fri Sep 23 02:36:15 CEST 2011

What you are seeing is the Poisson errors version of the Hauck-Donner effect.
Because you are using a log link (the default for Poisson), the estimate for the
parameter for the mean for the group that has the zeros goes as close as the
fitting algorithm will allow to -Inf : you get a value that is very large and negative.  
The SE is accordingly huge -- basically it is dragged along with the effect
estimate to get as close as the fitting algorithm will allow to Inf.    The 
approximation for the SED that is used in the Wald test that generates the 
t-statistics gives an SED that is meaningless.

For comparisons with the zero level, you need to use a likelihood ratio test
to compare the model you've fitted with a model in which the two factor levels
are combined. 

There's a discussion in Maindonald and Braun: Data Analysis and Graphics
using R (3rd end, 2010), pp. 261-266, also in the 2nd edition.  The analysis is 
for the moths data (species A) in the DAAG package.  

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 22/09/2011, at 7:25 PM, Claire Brittain wrote:

> 
> Dear R users,
> I am running a mixed model on some count data (n=90) using lmer with a Poisson error distribution and have two nested random effects based on repeated sampling of locations.
> One of my explanatory variables is a three level factor with n=30 observations for each level (in the R output below it's the factor "nathabcat"). When you plot this factor against the counts you can see that for one level the counts are very high, for one level low and for another level extremely low - that is all the counts for that level were zero (in the R output copied below "nathabcatlow"). In the model the difference between the very high and low levels comes out as significant but the difference between the high level and the extremely low level is non-significant. This does not make sense and it appears to come from an incredibly high standard error for the model estimate of the extremely low factor level. Intuitively I think that the standard error should not be high as all the values in that level are zero. It seems as though the model is not treating the zeros as I would expect.
> If I transform the count variable to x+1 so all the zeros become 1s it runs fine with no inflated standard error and the model finds a significant difference between the extremely low and high levels of the explanatory factor. Even if I add a single 1 to the zero counts in the extremely low level of the explanatory factor the model runs normally.
> Can anyone suggest why having only zeros is causing this problem?
> Many thanks
> Claire
> 
> When run on the count data:
> 
> model<-lmer(counts~distcat+nathabcat+avopen+(1|site:distcat),family=poisson)
> 
> summary(model)
> Generalized linear mixed model fit by the Laplace approximation 
> Formula: counts~ distcat + nathabcat + avopen + (1 | site:distcat) 
>   AIC   BIC logLik deviance
> 133.4 148.4 -60.72    121.4
> Random effects:
> Groups       Name        Variance Std.Dev.
> site:distcat (Intercept) 0.47972  0.69262 
> Number of obs: 90, groups: site:distcat, 30
> Fixed effects:
>                 Estimate Std. Error z value Pr(>|z|)    
> (Intercept)     8.926e-01  5.723e-01   1.560 0.118849    
> distcat1       -1.565e+00  4.375e-01  -3.578 0.000346 ***
> nathabcatlow   -1.969e+01  1.681e+03  -0.012 0.990654    
> nathabcatstrip -2.743e+00  4.897e-01  -5.602 2.11e-08 ***
> avopen          1.556e-02  5.474e-03   2.842 0.004480 ** 
> ---
> Signif. codes:  0 Œ***‚ 0.001 Œ**‚ 0.01 Œ*‚ 0.05 Œ.‚ 0.1 Œ ‚ 1 
> Correlation of Fixed Effects:
>            (Intr) dstct1 nthbctl nthbcts
> distcat1    -0.392                       
> nathabcatlw  0.000  0.000                
> nathbctstrp -0.394  0.233  0.000         
> avopen      -0.840  0.058  0.000   0.132 
> 
> 
> When run on the counts+1:
> 
> modelb<-lmer(countsplusone~distcat+nathabcat+avopen+(1|site:distcat),family=poisson)
> summary(modelb)
> 
> Generalized linear mixed model fit by the Laplace approximation 
> Formula: countsplusone~ distcat + nathabcat + avopen + (1 | site:distcat) 
>   AIC   BIC logLik deviance
> 104.5 119.5 -46.24    92.49
> Random effects:
> Groups       Name        Variance Std.Dev.
> site:distcat (Intercept) 0.12255  0.35007 
> Number of obs: 90, groups: site:distcat, 30
> Fixed effects:
>                Estimate Std. Error z value Pr(>|z|)    
> (Intercept)     1.327092   0.377347   3.517 0.000437 ***
> distcat1       -0.681733   0.200206  -3.405 0.000661 ***
> nathabcatlow   -1.686040   0.263526  -6.398 1.57e-10 ***
> nathabcatstrip -1.399987   0.240289  -5.826 5.67e-09 ***
> avopen          0.009323   0.003986   2.339 0.019352 *  
> ---
> Signif. codes:  0 Œ***‚ 0.001 Œ**‚ 0.01 Œ*‚ 0.05 Œ.‚ 0.1 Œ ‚ 1 
> Correlation of Fixed Effects:
>            (Intr) dstct1 nthbctl nthbcts
> distcat1    -0.262                       
> nathabcatlw -0.404  0.021                
> nathbctstrp -0.386  0.040  0.335         
> avopen      -0.906  0.037  0.241   0.196  		 	   		  
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models