[R-sig-ME] Zero cells in contrast matrix problem
Ben Bolker
bbolker at gmail.com
Fri May 29 20:32:24 CEST 2015
[Please keep r-sig-mixed-models in the cc: list ...]
It would seem consistent to me to use the penalized/Bayesian results
throughout, not just for the completely separated term. (It's not
"MCMCglmm-adjusted", unless I'm missing something ...)
On Fri, May 29, 2015 at 2:30 PM, Francesco Romano
<francescobryanromano at gmail.com> wrote:
> Ben that worked a charm!
> I guess now I'm only left with the question of whether it would be more
> appropriate to report the stats for all the pairwise comparisons, including
> those that were not subject to complete separation, from the
> MCMCglmm-adjusted model. In other words, the separation problem was only
> relevant to level B because there were no incorrect answers at all.
>
> Frank
>
> On Thu, May 28, 2015 at 8:39 PM, Ben Bolker <bbolker at gmail.com> wrote:
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 15-05-28 06:55 AM, Francesco Romano wrote:
>> > Many thanks to both.
>> >
>> > The approaches you suggest (and others online) help one deal with
>> > the separation problem but don't offer any specific advice as to
>> > how getting a valid p coefficient when comparing two levels of the
>> > model vexed by separation.
>> >
>> > Ben, here's the output of the bglmer which by the way would be
>> > ideal since it allows me to retain the random effect so that all my
>> > pairwise comparisons are conducted using mixed effects.
>> >
>> >> trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name), data =
>> >> trialglm,
>> > family = binomial) Warning message: package ‘blme’ was built under
>> > R version 3.1.2
>> >> summary(trial)
>> > Cov prior : Part.name ~ wishart(df = 3.5, scale = Inf,
>> > posterior.scale = cov, common.scale = TRUE) Prior dev : 1.4371
>> >
>> > Generalized linear mixed model fit by maximum likelihood (Laplace
>> > Approximation) ['bglmerMod'] Family: binomial ( logit ) Formula:
>> > Correct ~ Syntax.Semantics + (1 | Part.name) Data: trialglm
>> >
>> > AIC BIC logLik deviance df.resid 269.9 305.5 -126.0
>> > 251.9 376
>> >
>> > Scaled residuals: Min 1Q Median 3Q Max -0.9828
>> > -0.4281 -0.2445 -0.0002 5.7872
>> >
>> > Random effects: Groups Name Variance Std.Dev. Part.name
>> > (Intercept) 0.3836 0.6194 Number of obs: 385, groups: Part.name,
>> > 16
>> >
>> > Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept)
>> > -1.8671 0.4538 -4.114 3.89e-05 *** Syntax.Semantics A
>> > 0.8121 0.5397 1.505 0.1324 Syntax.Semantics B -16.4391
>> > 1195.5031 -0.014 0.9890 Syntax.Semantics C -1.1323 0.7462
>> > -1.517 0.1292 Syntax.Semantics D 0.1789 0.5853 0.306
>> > 0.7598 Syntax.Semantics E -0.8071 0.7500 -1.076 0.2819
>> > Syntax.Semantics F -1.5051 0.8575 -1.755 0.0792 .
>> > Syntax.Semantics G 0.4395 0.5417 0.811 0.4171 ---
>> > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> >
>> > Unfortunately the separation problem is still there. Should I be
>> > constraining the parameter somehow? How would I do that? The data
>> > is below.
>>
>> Did you read the section in the URL I suggested? Just using bglmer
>> isn't enough; you also have to set a prior on the fixed effects.
>>
>> Your data don't seem to be attached (note that the mailing list
>> strips most non-ASCII file types).
>>
>> >
>> > In passing I also tried brglm which solves the separation problem
>> > but tells me comparison is significant which I don't believe one
>> > bit (see the data below). I am pretty sure about this because when
>> > I reveled and look at the comparisons I was able to compute using
>> > glmer, these turn out to be non-significant, when glmer told me
>> > they were:
>> >
>> >> trial<-brglm(Correct ~ Syntax.Semantics, data = trialglm, family
>> >> =
>> > binomial) Warning messages: 1: package ‘elrm’ was built under R
>> > version 3.1.2 2: package ‘coda’ was built under R version 3.1.3
>> >> summary(trial)
>> >
>> > Call: brglm(formula = Correct ~ Syntax.Semantics, family =
>> > binomial, data = trialglm)
>> >
>> >
>> > Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)
>> > -1.6358 0.4035 -4.053 5.05e-05 *** Syntax.Semantics A 0.6689
>> > 0.5169 1.294 0.1957 Syntax.Semantics B -3.0182 1.4902
>> > -2.025 0.0428 * Syntax.Semantics C -1.0135 0.6889 -1.471
>> > 0.1413 Syntax.Semantics D 0.1515 0.5571 0.272 0.7857
>> > Syntax.Semantics E -0.7878 0.6937 -1.136 0.2561
>> > Syntax.Semantics F -1.2874 0.7702 -1.672 0.0946 .
>> > Syntax.Semantics G 0.4358 0.5186 0.840 0.4007 --- Signif.
>> > codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> >
>> > (Dispersion parameter for binomial family taken to be 1)
>> >
>> > Null deviance: 262.51 on 384 degrees of freedom Residual
>> > deviance: 256.22 on 377 degrees of freedom Penalized deviance:
>> > 245.5554 AIC: 272.22
>> >
>> >
>> > MCMCglmm is too complex for me.
>> >
>> > Wolfgang, I tried the penalized likelihood method (logistf
>> > function) but output is hard to read:
>> >
>> >> trial<-logistf(Correct ~ Syntax.Semantics, data = trialglm,
>> >> family =
>> > binomial) Warning messages: 1: package ‘logistf’ was built under R
>> > version 3.1.2 2: package ‘mice’ was built under R version 3.1.2
>> >> summary(trial)
>> > logistf(formula = Correct ~ Syntax.Semantics, data = trialglm,
>> > family = binomial)
>> >
>> > Model fitted by Penalized ML Confidence intervals and p-values by
>> > Profile Likelihood Profile Likelihood Profile Likelihood Profile
>> > Likelihood Profile Likelihood Profile Likelihood Profile Likelihood
>> > Profile Likelihood
>> >
>> > coef se(coef) lower 0.95 upper 0.95 Chisq p (Intercept)
>> > 3.2094017 0.7724482 2.9648747 3.5127830 0.000000 1.000000e+00
>> > Syntax.Semantics A 4.1767737 6.3254344 0.4224696 12.0673987
>> > 64.224452 1.110223e-15 Syntax.Semantics B -1.0583602 0.8959376
>> > -1.3963977 -0.7625216 0.000000 1.000000e+00 Syntax.Semantics C
>> > -0.7299070 0.9308193 -1.0765598 -0.4180076 0.000000 1.000000e+00
>> > Syntax.Semantics D 0.2314740 1.1563731 -0.1704535 0.6479908
>> > 1.156512 2.821901e-01 Syntax.Semantics E -0.6476907 0.9771824
>> > -1.0076740 -0.3164066 0.000000 1.000000e+00 Syntax.Semantics F
>> > -0.8271499 0.9305931 -1.1743834 -0.5160799 0.000000 1.000000e+00
>> > Syntax.Semantics G 0.9909046 1.3787175 0.5457741 1.5353981
>> > 0.000000 1.000000e+00
>> >
>> > Likelihood ratio test=121.9841 on 7 df, p=0, n=385 Wald test =
>> > 5.334321 on 7 df, p = 0.6192356
>> >
>> > In particular, what is this model telling me? That Z (my ref level)
>> > and B are significantly different?
>> >
>> > I'm happy to try the elrm function with exact logistic regression
>> > but I am not capable of programming it. Besides, would it give me
>> > valid estimates for the comparison between the Z and B levels? The
>> > data frame should look like this:
>> >
>> > Outcome variable (Correct, incorrect) Predictor variable (A, B, C,
>> > D, E, F, G, Z) Counts (E: 38,3; B: 51,0; Z: 37,7; G: 40,12; D:
>> > 36,8; C:45,3; A: 34,13; F:65,22).
>> >
>> > Thank you! F.
>> >
>> > On Thu, May 28, 2015 at 2:28 AM, Ben Bolker <bbolker at gmail.com>
>> > wrote:
>> >
>> >> And for what it's worth, you can do this in conjunction with lme4
>> >> by using the blme package instead (a thin Bayesian wrapper around
>> >> lme4), or via the MCMCglmm package; see
>> >> http://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html
>> >> for an example (search for "complete separation").
>> >>
>> >> On Wed, May 27, 2015 at 5:21 PM, Viechtbauer Wolfgang (STAT)
>> >> <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
>> >>> You may need to consider using an 'exact', Bayesian, or
>> >>> penalized
>> >> likelihood approach (along the lines proposed by Firth).
>> >>>
>> >>> Maybe a place to start:
>> >>
>> >> http://sas-and-r.blogspot.nl/2010/11/example-815-firth-logistic-regression.html
>> >>>
>> >>>
>> >>
>> Best,
>> >>> Wolfgang
>> >>>
>> >>>> -----Original Message----- From: R-sig-mixed-models
>> >>>> [mailto:r-sig-mixed-models-bounces at r- project.org] On Behalf
>> >>>> Of Francesco Romano Sent: Wednesday, May 27, 2015 23:00 To:
>> >>>> r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Zero
>> >>>> cells in contrast matrix problem
>> >>>>
>> >>>> After giving up on a glmer for my data, I remembered a post
>> >>>> by Roger
>> >> Levy
>> >>>> suggesting to try the use non mixed effects glm when one of
>> >>>> the cells in a matrix is zero.
>> >>>>
>> >>>> To put this into perspective:
>> >>>>
>> >>>>> trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name),
>> >>>>> data =
>> >>>> trialglm, family = binomial)
>> >>>>
>> >>>> Warning messages: 1: In checkConv(attr(opt, "derivs"),
>> >>>> opt$par, ctrl = control$checkConv, : Model failed to converge
>> >>>> with max|grad| = 0.053657 (tol = 0.001, component 4) 2: In
>> >>>> checkConv(attr(opt, "derivs"), opt$par, ctrl =
>> >>>> control$checkConv, : Model is nearly unidentifiable: large
>> >>>> eigenvalue ratio - Rescale variables?
>> >>>>
>> >>>> My data has a binary outcome, correct or incorrect, a fixed
>> >>>> effect predictor factor with 8 levels, and a random effect
>> >>>> for participants. I believe the problem R is encountering is
>> >>>> with one level of the factor (let us call it level B) which
>> >>>> has no counts (no I won' t try to post the table from the
>> >>>> paper with the counts because I know it will get garbled
>> >>>> up!).
>> >>>>
>> >>>> I attempt a glm with the same data:
>> >>>>
>> >>>>> trial<-glm(Correct ~ Syntax.Semantics, data = trialglm,
>> >>>>> family =
>> >>>> binomial)
>> >>>>> anova(trial)
>> >>>> Analysis of Deviance Table
>> >>>>
>> >>>> Model: binomial, link: logit
>> >>>>
>> >>>> Response: Correct
>> >>>>
>> >>>> Terms added sequentially (first to last)
>> >>>>
>> >>>>
>> >>>> Df Deviance Resid. Df Resid. Dev NULL
>> >>>> 384 289.63 Syntax.Semantics 7 34.651 377
>> >>>> 254.97
>> >>>>> summary(trial)
>> >>>>
>> >>>> Call: glm(formula = Correct ~ Syntax.Semantics, family =
>> >>>> binomial, data = trialglm)
>> >>>>
>> >>>> Deviance Residuals: Min 1Q Median 3Q
>> >>>> Max -0.79480 -0.62569 -0.34474 -0.00013 2.52113
>> >>>>
>> >>>> Coefficients: Estimate Std. Error z value Pr(>|z|)
>> >>>> (Intercept) -1.6917 0.4113 -4.113
>> >>>> 3.91e-05 *** Syntax.Semantics A 0.7013 0.5241 1.338
>> >>>> 0.1809 Syntax.Semantics B -16.8744 904.5273 -0.019
>> >>>> 0.9851 Syntax.Semantics C -1.1015 0.7231 -1.523
>> >>>> 0.1277 Syntax.Semantics D 0.1602 0.5667 0.283
>> >>>> 0.7774 Syntax.Semantics E -0.8733 0.7267 -1.202
>> >>>> 0.2295 Syntax.Semantics F -1.4438 0.8312 -1.737
>> >>>> 0.0824 . Syntax.Semantics G 0.4630 0.5262 0.880
>> >>>> 0.3789 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
>> >>>> ‘.’ 0.1 ‘ ’ 1
>> >>>>
>> >>>> (Dispersion parameter for binomial family taken to be 1)
>> >>>>
>> >>>> Null deviance: 289.63 on 384 degrees of freedom Residual
>> >>>> deviance: 254.98 on 377 degrees of freedom AIC: 270.98
>> >>>>
>> >>>> Number of Fisher Scoring iterations: 17
>> >>>>
>> >>>> The comparison I'm interested in is between level B and the
>> >>>> reference level but it cannot be estimated as shown by the
>> >>>> ridiculously high estimate and SE value.
>> >>>>
>> >>>> Any suggestions on how to get a decent beta, SE, z, and p?
>> >>>> It's the only comparison missing in the table for the levels
>> >>>> I need so I think it
>> >> would
>> >>>> be a bit unacademic of me to close this deal saying 'the
>> >>>> difference
>> >> could
>> >>>> not be estimated due to zero count'.
>> >>>>
>> >>>> And by the way I have seen this comparison being generated
>> >>>> using other stats.
>> >>>>
>> >>>> Thanks in advance,
>> >>>>
>> >>>> Frank
>> >>>>
>> >>>> [[alternative HTML version deleted]]
>> >>>>
>> >>>> _______________________________________________
>> >>>> R-sig-mixed-models at r-project.org mailing list
>> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >>> _______________________________________________
>> >>> R-sig-mixed-models at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >>
>> >
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.11 (GNU/Linux)
>>
>> iQEcBAEBAgAGBQJVZ2DeAAoJEOCV5YRblxUH9f0IAN/LTzJllxXqmdP4U2bbDNOR
>> XnjYDsQ+cF6eR6aRMxWK1nj7Lgdi1pvqOU/3CSMVke2HW2Cr07wR2VDtqHwWRAgZ
>> jTlzlJ/iA5o32T1U2Wm9jrle0E0RpTMrA8SZ8HsGVKT3cD/5TNo9eoPAw3DV45AO
>> hmwUJf0NYLhZwOJ2QAsk1rAn06CBmrVSXFUmdGKpODELFJ4whAn95phE8pLY+aW9
>> qfO4Rq4FcZt1wdRwlZmk8woEeqeySb+rBRxZCVQ0HuyoEGONHMq5Wa1hnffwVR3V
>> yiIo1Vtd7sTbxAs96DeP8AItyHTvgsKRJphEK/PYguDQCGeR70sQEL53FTdHM60=
>> =3UD2
>> -----END PGP SIGNATURE-----
>
>
More information about the R-sig-mixed-models
mailing list