[R] problem with glm(family=binomial) when some levels have only 0 proportion values
Jürg Schulze
Juerg.Schulze at stud.unibas.ch
Wed Mar 2 11:01:42 CET 2011
Hello everybody
I want to compare the proportions of germinated seeds (seed batches of
size 10) of three plant types (1,2,3) with a glm with binomial data
(following the method in Crawley: Statistics,an introduction using R,
p.247).
The problem seems to be that in two plant types (2,3) all plants have
proportions = 0.
I give you my data and the model I'm running:
success failure type
[1,] 0 10 3
[2,] 0 10 2
[3,] 0 10 2
[4,] 0 10 2
[5,] 0 10 2
[6,] 0 10 2
[7,] 0 10 2
[8,] 4 6 1
[9,] 4 6 1
[10,] 3 7 1
[11,] 5 5 1
[12,] 7 3 1
[13,] 4 6 1
[14,] 0 10 3
[15,] 0 10 3
[16,] 0 10 3
[17,] 0 10 3
[18,] 0 10 3
[19,] 0 10 3
[20,] 0 10 2
[21,] 0 10 2
[22,] 0 10 2
[23,] 9 1 1
[24,] 6 4 1
[25,] 4 6 1
[26,] 0 10 3
[27,] 0 10 3
y<- cbind(success, failure)
Call:
glm(formula = y ~ type, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q
-1.3521849 -0.0000427 -0.0000427 -0.0000427
Max
2.6477556
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.04445 0.21087 0.211 0.833
typeFxC -23.16283 6696.13233 -0.003 0.997
typeFxD -23.16283 6696.13233 -0.003 0.997
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 134.395 on 26 degrees of freedom
Residual deviance: 12.622 on 24 degrees of freedom
AIC: 42.437
Number of Fisher Scoring iterations: 20
Huge standard errors are calculated and there is no difference between
plant type 1 and 2 or between plant type 1 and 3.
If I add 1 to all successes, so that all the 0 values disappear, the
standard error becomes lower and I find highly significant differences
between the plant types.
suc<- success + 1
fail<- 11 - suc
Y<- cbind(suc,fail)
Call:
glm(formula = Y ~ type, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q
-1.279e+00 -4.712e-08 -4.712e-08 0.000e+00
Max
2.584e+00
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.2231 0.2023 1.103 0.27
typeFxC -2.5257 0.4039 -6.253 4.02e-10 ***
typeFxD -2.5257 0.4039 -6.253 4.02e-10 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 86.391 on 26 degrees of freedom
Residual deviance: 11.793 on 24 degrees of freedom
AIC: 76.77
Number of Fisher Scoring iterations: 4
So I think the 0 values of all plants of group 2 and 3 are the
problem, do you agree?
I don't know why this is a problem, or how I can explain to a reviewer
why a data transformation (+ 1) is necessary with such a dataset.
I would greatly appreciate any comments.
Juerg
______________________________________
Jürg Schulze
Department of Environmental Sciences
Section of Conservation Biology
University of Basel
St. Johanns-Vorstadt 10
4056 Basel, Switzerland
Tel.: ++41/61/267 08 47
More information about the R-help
mailing list