[R-sig-ME] Zero cells in contrast matrix problem

Fri May 29 20:32:24 CEST 2015

  [Please keep r-sig-mixed-models in the cc: list ...]

 It would seem consistent to me to use the penalized/Bayesian results
throughout, not just for the completely separated term. (It's not
"MCMCglmm-adjusted", unless I'm missing something ...)

On Fri, May 29, 2015 at 2:30 PM, Francesco Romano
<francescobryanromano at gmail.com> wrote:
> Ben that worked a charm!
> I guess now I'm only left with the question of whether it would be more
> appropriate to report the stats for all the pairwise comparisons, including
> those that were not subject to complete separation, from the
> MCMCglmm-adjusted model. In other words, the separation problem was only
> relevant to level B because there were no incorrect answers at all.
>
> Frank
>
> On Thu, May 28, 2015 at 8:39 PM, Ben Bolker <bbolker at gmail.com> wrote:
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 15-05-28 06:55 AM, Francesco Romano wrote:
>> > Many thanks to both.
>> >
>> > The approaches you suggest (and others online) help one deal with
>> > the separation problem but don't offer any specific advice as to
>> > how getting a valid p coefficient when comparing two levels of the
>> > model vexed by separation.
>> >
>> > Ben, here's the output of the bglmer which by the way would be
>> > ideal since it allows me to retain the random effect so that all my
>> > pairwise comparisons are conducted using mixed effects.
>> >
>> >> trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name), data =
>> >> trialglm,
>> > family = binomial) Warning message: package ‘blme’ was built under
>> > R version 3.1.2
>> >> summary(trial)
>> > Cov prior  : Part.name ~ wishart(df = 3.5, scale = Inf,
>> > posterior.scale = cov, common.scale = TRUE) Prior dev  : 1.4371
>> >
>> > Generalized linear mixed model fit by maximum likelihood (Laplace
>> > Approximation) ['bglmerMod'] Family: binomial  ( logit ) Formula:
>> > Correct ~ Syntax.Semantics + (1 | Part.name) Data: trialglm
>> >
>> > AIC      BIC   logLik deviance df.resid 269.9    305.5   -126.0
>> > 251.9      376
>> >
>> > Scaled residuals: Min      1Q  Median      3Q     Max -0.9828
>> > -0.4281 -0.2445 -0.0002  5.7872
>> >
>> > Random effects: Groups    Name        Variance Std.Dev. Part.name
>> > (Intercept) 0.3836   0.6194 Number of obs: 385, groups:  Part.name,
>> > 16
>> >
>> > Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept)
>> > -1.8671     0.4538  -4.114 3.89e-05 *** Syntax.Semantics A
>> > 0.8121     0.5397   1.505   0.1324 Syntax.Semantics B  -16.4391
>> > 1195.5031  -0.014   0.9890 Syntax.Semantics C   -1.1323     0.7462
>> > -1.517   0.1292 Syntax.Semantics D    0.1789     0.5853   0.306
>> > 0.7598 Syntax.Semantics E   -0.8071     0.7500  -1.076   0.2819
>> > Syntax.Semantics F   -1.5051     0.8575  -1.755   0.0792 .
>> > Syntax.Semantics G    0.4395     0.5417   0.811   0.4171 ---
>> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> >
>> > Unfortunately the separation problem is still there. Should I be
>> > constraining the parameter somehow? How would I do that? The data
>> > is below.
>>
>>    Did you read the section in the URL I suggested?  Just using bglmer
>> isn't enough; you also have to set a prior on the fixed effects.
>>
>>   Your data don't seem to be attached (note that the mailing list
>> strips most non-ASCII file types).
>>
>> >
>> > In passing I also tried brglm which solves the separation problem
>> > but tells me comparison is significant which I don't believe one
>> > bit (see the data below). I am pretty sure about this because when
>> > I reveled and look at the comparisons I was able to compute using
>> > glmer, these turn out to be non-significant, when glmer told me
>> > they were:
>> >
>> >> trial<-brglm(Correct ~ Syntax.Semantics, data = trialglm, family
>> >> =
>> > binomial) Warning messages: 1: package ‘elrm’ was built under R
>> > version 3.1.2 2: package ‘coda’ was built under R version 3.1.3
>> >> summary(trial)
>> >
>> > Call: brglm(formula = Correct ~ Syntax.Semantics, family =
>> > binomial, data = trialglm)
>> >
>> >
>> > Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)
>> > -1.6358     0.4035  -4.053 5.05e-05 *** Syntax.Semantics A   0.6689
>> > 0.5169   1.294   0.1957 Syntax.Semantics B  -3.0182     1.4902
>> > -2.025   0.0428 * Syntax.Semantics C  -1.0135     0.6889  -1.471
>> > 0.1413 Syntax.Semantics D   0.1515     0.5571   0.272   0.7857
>> > Syntax.Semantics E  -0.7878     0.6937  -1.136   0.2561
>> > Syntax.Semantics F  -1.2874     0.7702  -1.672   0.0946 .
>> > Syntax.Semantics G   0.4358     0.5186   0.840   0.4007 --- Signif.
>> > codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> >
>> > (Dispersion parameter for binomial family taken to be 1)
>> >
>> > Null deviance: 262.51  on 384  degrees of freedom Residual
>> > deviance: 256.22  on 377  degrees of freedom Penalized deviance:
>> > 245.5554 AIC:  272.22
>> >
>> >
>> > MCMCglmm is too complex for me.
>> >
>> > Wolfgang, I tried the penalized likelihood method (logistf
>> > function) but output is hard to read:
>> >
>> >> trial<-logistf(Correct ~ Syntax.Semantics, data = trialglm,
>> >> family =
>> > binomial) Warning messages: 1: package ‘logistf’ was built under R
>> > version 3.1.2 2: package ‘mice’ was built under R version 3.1.2
>> >> summary(trial)
>> > logistf(formula = Correct ~ Syntax.Semantics, data = trialglm,
>> > family = binomial)
>> >
>> > Model fitted by Penalized ML Confidence intervals and p-values by
>> > Profile Likelihood Profile Likelihood Profile Likelihood Profile
>> > Likelihood Profile Likelihood Profile Likelihood Profile Likelihood
>> > Profile Likelihood
>> >
>> > coef  se(coef) lower 0.95 upper 0.95 Chisq            p (Intercept)
>> > 3.2094017 0.7724482  2.9648747  3.5127830 0.000000 1.000000e+00
>> > Syntax.Semantics A  4.1767737 6.3254344  0.4224696 12.0673987
>> > 64.224452 1.110223e-15 Syntax.Semantics B -1.0583602 0.8959376
>> > -1.3963977 -0.7625216  0.000000 1.000000e+00 Syntax.Semantics C
>> > -0.7299070 0.9308193 -1.0765598 -0.4180076  0.000000 1.000000e+00
>> > Syntax.Semantics D  0.2314740 1.1563731 -0.1704535  0.6479908
>> > 1.156512 2.821901e-01 Syntax.Semantics E -0.6476907 0.9771824
>> > -1.0076740 -0.3164066  0.000000 1.000000e+00 Syntax.Semantics F
>> > -0.8271499 0.9305931 -1.1743834 -0.5160799  0.000000 1.000000e+00
>> > Syntax.Semantics G  0.9909046 1.3787175  0.5457741  1.5353981
>> > 0.000000 1.000000e+00
>> >
>> > Likelihood ratio test=121.9841 on 7 df, p=0, n=385 Wald test =
>> > 5.334321 on 7 df, p = 0.6192356
>> >
>> > In particular, what is this model telling me? That Z (my ref level)
>> > and B are significantly different?
>> >
>> > I'm happy to try the elrm function with exact logistic regression
>> > but I am not capable of programming it. Besides, would it give me
>> > valid estimates for the comparison between the Z and B levels? The
>> > data frame should look like this:
>> >
>> > Outcome variable (Correct, incorrect) Predictor variable (A, B, C,
>> > D, E, F, G, Z) Counts (E: 38,3; B: 51,0; Z: 37,7; G: 40,12; D:
>> > 36,8; C:45,3; A: 34,13; F:65,22).
>> >
>> > Thank you! F.
>> >
>> > On Thu, May 28, 2015 at 2:28 AM, Ben Bolker <bbolker at gmail.com>
>> > wrote:
>> >
>> >> And for what it's worth, you can do this in conjunction with lme4
>> >> by using the blme package instead (a thin Bayesian wrapper around
>> >> lme4), or via the MCMCglmm package; see
>> >> http://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html
>> >> for an example (search for "complete separation").
>> >>
>> >> On Wed, May 27, 2015 at 5:21 PM, Viechtbauer Wolfgang (STAT)
>> >> <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
>> >>> You may need to consider using an 'exact', Bayesian, or
>> >>> penalized
>> >> likelihood approach (along the lines proposed by Firth).
>> >>>
>> >>> Maybe a place to start:
>> >>
>> >> http://sas-and-r.blogspot.nl/2010/11/example-815-firth-logistic-regression.html
>> >>>
>> >>>
>> >>
>> Best,
>> >>> Wolfgang
>> >>>
>> >>>> -----Original Message----- From: R-sig-mixed-models
>> >>>> [mailto:r-sig-mixed-models-bounces at r- project.org] On Behalf
>> >>>> Of Francesco Romano Sent: Wednesday, May 27, 2015 23:00 To:
>> >>>> r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Zero
>> >>>> cells in contrast matrix problem
>> >>>>
>> >>>> After giving up on a glmer for my data, I remembered a post
>> >>>> by Roger
>> >> Levy
>> >>>> suggesting to try the use non mixed effects glm when one of
>> >>>> the cells in a matrix is zero.
>> >>>>
>> >>>> To put this into perspective:
>> >>>>
>> >>>>> trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name),
>> >>>>> data =
>> >>>> trialglm, family = binomial)
>> >>>>
>> >>>> Warning messages: 1: In checkConv(attr(opt, "derivs"),
>> >>>> opt$par, ctrl = control$checkConv, : Model failed to converge
>> >>>> with max|grad| = 0.053657 (tol = 0.001, component 4) 2: In
>> >>>> checkConv(attr(opt, "derivs"), opt$par, ctrl =
>> >>>> control$checkConv, : Model is nearly unidentifiable: large
>> >>>> eigenvalue ratio - Rescale variables?
>> >>>>
>> >>>> My data has a binary outcome, correct or incorrect, a fixed
>> >>>> effect predictor factor with 8 levels, and a random effect
>> >>>> for participants. I believe the problem R is encountering is
>> >>>> with one level of the factor (let us call it level B) which
>> >>>> has no counts (no I won' t try to post the table from the
>> >>>> paper with the counts because I know it will get garbled
>> >>>> up!).
>> >>>>
>> >>>> I attempt a glm with the same data:
>> >>>>
>> >>>>> trial<-glm(Correct ~ Syntax.Semantics, data = trialglm,
>> >>>>> family =
>> >>>> binomial)
>> >>>>> anova(trial)
>> >>>> Analysis of Deviance Table
>> >>>>
>> >>>> Model: binomial, link: logit
>> >>>>
>> >>>> Response: Correct
>> >>>>
>> >>>> Terms added sequentially (first to last)
>> >>>>
>> >>>>
>> >>>> Df Deviance Resid. Df Resid. Dev NULL
>> >>>> 384     289.63 Syntax.Semantics  7   34.651       377
>> >>>> 254.97
>> >>>>> summary(trial)
>> >>>>
>> >>>> Call: glm(formula = Correct ~ Syntax.Semantics, family =
>> >>>> binomial, data = trialglm)
>> >>>>
>> >>>> Deviance Residuals: Min        1Q    Median        3Q
>> >>>> Max -0.79480  -0.62569  -0.34474  -0.00013   2.52113
>> >>>>
>> >>>> Coefficients: Estimate Std. Error z value Pr(>|z|)
>> >>>> (Intercept)                 -1.6917     0.4113  -4.113
>> >>>> 3.91e-05 *** Syntax.Semantics A   0.7013     0.5241   1.338
>> >>>> 0.1809 Syntax.Semantics B -16.8744   904.5273  -0.019
>> >>>> 0.9851 Syntax.Semantics C  -1.1015     0.7231  -1.523
>> >>>> 0.1277 Syntax.Semantics D   0.1602     0.5667   0.283
>> >>>> 0.7774 Syntax.Semantics E  -0.8733     0.7267  -1.202
>> >>>> 0.2295 Syntax.Semantics F  -1.4438     0.8312  -1.737
>> >>>> 0.0824 . Syntax.Semantics G   0.4630     0.5262   0.880
>> >>>> 0.3789 --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
>> >>>> ‘.’ 0.1 ‘ ’ 1
>> >>>>
>> >>>> (Dispersion parameter for binomial family taken to be 1)
>> >>>>
>> >>>> Null deviance: 289.63  on 384  degrees of freedom Residual
>> >>>> deviance: 254.98  on 377  degrees of freedom AIC: 270.98
>> >>>>
>> >>>> Number of Fisher Scoring iterations: 17
>> >>>>
>> >>>> The comparison I'm interested in is between level B and the
>> >>>> reference level but it cannot be estimated as shown by the
>> >>>> ridiculously high estimate and SE value.
>> >>>>
>> >>>> Any suggestions on how to get a decent beta, SE, z, and p?
>> >>>> It's the only comparison missing in the table for the levels
>> >>>> I need so I think it
>> >> would
>> >>>> be a bit unacademic of me to close this deal saying 'the
>> >>>> difference
>> >> could
>> >>>> not be estimated due to zero count'.
>> >>>>
>> >>>> And by the way I have seen this comparison being generated
>> >>>> using other stats.
>> >>>>
>> >>>> Thanks in advance,
>> >>>>
>> >>>> Frank
>> >>>>
>> >>>> [[alternative HTML version deleted]]
>> >>>>
>> >>>> _______________________________________________
>> >>>> R-sig-mixed-models at r-project.org mailing list
>> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >>> _______________________________________________
>> >>> R-sig-mixed-models at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >>
>> >
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.11 (GNU/Linux)
>>
>> iQEcBAEBAgAGBQJVZ2DeAAoJEOCV5YRblxUH9f0IAN/LTzJllxXqmdP4U2bbDNOR
>> XnjYDsQ+cF6eR6aRMxWK1nj7Lgdi1pvqOU/3CSMVke2HW2Cr07wR2VDtqHwWRAgZ
>> jTlzlJ/iA5o32T1U2Wm9jrle0E0RpTMrA8SZ8HsGVKT3cD/5TNo9eoPAw3DV45AO
>> hmwUJf0NYLhZwOJ2QAsk1rAn06CBmrVSXFUmdGKpODELFJ4whAn95phE8pLY+aW9
>> qfO4Rq4FcZt1wdRwlZmk8woEeqeySb+rBRxZCVQ0HuyoEGONHMq5Wa1hnffwVR3V
>> yiIo1Vtd7sTbxAs96DeP8AItyHTvgsKRJphEK/PYguDQCGeR70sQEL53FTdHM60=
>> =3UD2
>> -----END PGP SIGNATURE-----
>
>