[R-sig-ME] Zero cells in contrast matrix problem

Wed May 27 23:21:28 CEST 2015

You may need to consider using an 'exact', Bayesian, or penalized likelihood approach (along the lines proposed by Firth).

Maybe a place to start: http://sas-and-r.blogspot.nl/2010/11/example-815-firth-logistic-regression.html

Best,
Wolfgang

> -----Original Message-----
> From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-
> project.org] On Behalf Of Francesco Romano
> Sent: Wednesday, May 27, 2015 23:00
> To: r-sig-mixed-models at r-project.org
> Subject: [R-sig-ME] Zero cells in contrast matrix problem
> 
> After giving up on a glmer for my data, I remembered a post by Roger Levy
> suggesting to try the use non mixed effects glm when one of the cells in
> a
> matrix is zero.
> 
> To put this into perspective:
> 
> > trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name), data =
> trialglm, family = binomial)
> 
> Warning messages:
> 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
> :
>   Model failed to converge with max|grad| = 0.053657 (tol = 0.001,
> component 4)
> 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
> :
>   Model is nearly unidentifiable: large eigenvalue ratio
>  - Rescale variables?
> 
> My data has a binary outcome, correct or incorrect, a fixed effect
> predictor factor with 8 levels, and a random effect for participants. I
> believe the problem R is encountering is with one level of the factor
> (let
> us call it level B) which has no counts (no I won' t try to post the
> table
> from the paper with the counts because I know it will get garbled up!).
> 
> I attempt a glm with the same data:
> 
> > trial<-glm(Correct ~ Syntax.Semantics, data = trialglm, family =
> binomial)
> > anova(trial)
> Analysis of Deviance Table
> 
> Model: binomial, link: logit
> 
> Response: Correct
> 
> Terms added sequentially (first to last)
> 
> 
>                  Df Deviance Resid. Df Resid. Dev
> NULL                               384     289.63
> Syntax.Semantics  7   34.651       377     254.97
> > summary(trial)
> 
> Call:
> glm(formula = Correct ~ Syntax.Semantics, family = binomial,
>     data = trialglm)
> 
> Deviance Residuals:
>      Min        1Q    Median        3Q       Max
> -0.79480  -0.62569  -0.34474  -0.00013   2.52113
> 
> Coefficients:
>                            Estimate Std. Error z value Pr(>|z|)
> (Intercept)                 -1.6917     0.4113  -4.113 3.91e-05 ***
> Syntax.Semantics A   0.7013     0.5241   1.338   0.1809
> Syntax.Semantics B -16.8744   904.5273  -0.019   0.9851
> Syntax.Semantics C  -1.1015     0.7231  -1.523   0.1277
> Syntax.Semantics D   0.1602     0.5667   0.283   0.7774
> Syntax.Semantics E  -0.8733     0.7267  -1.202   0.2295
> Syntax.Semantics F  -1.4438     0.8312  -1.737   0.0824 .
> Syntax.Semantics G   0.4630     0.5262   0.880   0.3789
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> (Dispersion parameter for binomial family taken to be 1)
> 
>     Null deviance: 289.63  on 384  degrees of freedom
> Residual deviance: 254.98  on 377  degrees of freedom
> AIC: 270.98
> 
> Number of Fisher Scoring iterations: 17
> 
>  The comparison I'm interested in is between level B and the reference
> level but it cannot be estimated as shown by the ridiculously high
> estimate
> and SE value.
> 
> Any suggestions on how to get a decent beta, SE, z, and p? It's the only
> comparison missing in the table for the levels I need so I think it would
> be a bit unacademic of me to close this deal saying 'the difference could
> not be estimated due to zero count'.
> 
> And by the way I have seen this comparison being generated using other
> stats.
> 
> Thanks in advance,
> 
> Frank
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models