[R-sig-ME] Zero cells in contrast matrix problem
Ben Bolker
bbolker at gmail.com
Thu May 28 20:39:26 CEST 2015
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 15-05-28 06:55 AM, Francesco Romano wrote:
> Many thanks to both.
>
> The approaches you suggest (and others online) help one deal with
> the separation problem but don't offer any specific advice as to
> how getting a valid p coefficient when comparing two levels of the
> model vexed by separation.
>
> Ben, here's the output of the bglmer which by the way would be
> ideal since it allows me to retain the random effect so that all my
> pairwise comparisons are conducted using mixed effects.
>
>> trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name), data =
>> trialglm,
> family = binomial) Warning message: package ‘blme’ was built under
> R version 3.1.2
>> summary(trial)
> Cov prior : Part.name ~ wishart(df = 3.5, scale = Inf,
> posterior.scale = cov, common.scale = TRUE) Prior dev : 1.4371
>
> Generalized linear mixed model fit by maximum likelihood (Laplace
> Approximation) ['bglmerMod'] Family: binomial ( logit ) Formula:
> Correct ~ Syntax.Semantics + (1 | Part.name) Data: trialglm
>
> AIC BIC logLik deviance df.resid 269.9 305.5 -126.0
> 251.9 376
>
> Scaled residuals: Min 1Q Median 3Q Max -0.9828
> -0.4281 -0.2445 -0.0002 5.7872
>
> Random effects: Groups Name Variance Std.Dev. Part.name
> (Intercept) 0.3836 0.6194 Number of obs: 385, groups: Part.name,
> 16
>
> Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept)
> -1.8671 0.4538 -4.114 3.89e-05 *** Syntax.Semantics A
> 0.8121 0.5397 1.505 0.1324 Syntax.Semantics B -16.4391
> 1195.5031 -0.014 0.9890 Syntax.Semantics C -1.1323 0.7462
> -1.517 0.1292 Syntax.Semantics D 0.1789 0.5853 0.306
> 0.7598 Syntax.Semantics E -0.8071 0.7500 -1.076 0.2819
> Syntax.Semantics F -1.5051 0.8575 -1.755 0.0792 .
> Syntax.Semantics G 0.4395 0.5417 0.811 0.4171 ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Unfortunately the separation problem is still there. Should I be
> constraining the parameter somehow? How would I do that? The data
> is below.
Did you read the section in the URL I suggested? Just using bglmer
isn't enough; you also have to set a prior on the fixed effects.
Your data don't seem to be attached (note that the mailing list
strips most non-ASCII file types).
>
> In passing I also tried brglm which solves the separation problem
> but tells me comparison is significant which I don't believe one
> bit (see the data below). I am pretty sure about this because when
> I reveled and look at the comparisons I was able to compute using
> glmer, these turn out to be non-significant, when glmer told me
> they were:
>
>> trial<-brglm(Correct ~ Syntax.Semantics, data = trialglm, family
>> =
> binomial) Warning messages: 1: package ‘elrm’ was built under R
> version 3.1.2 2: package ‘coda’ was built under R version 3.1.3
>> summary(trial)
>
> Call: brglm(formula = Correct ~ Syntax.Semantics, family =
> binomial, data = trialglm)
>
>
> Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)
> -1.6358 0.4035 -4.053 5.05e-05 *** Syntax.Semantics A 0.6689
> 0.5169 1.294 0.1957 Syntax.Semantics B -3.0182 1.4902
> -2.025 0.0428 * Syntax.Semantics C -1.0135 0.6889 -1.471
> 0.1413 Syntax.Semantics D 0.1515 0.5571 0.272 0.7857
> Syntax.Semantics E -0.7878 0.6937 -1.136 0.2561
> Syntax.Semantics F -1.2874 0.7702 -1.672 0.0946 .
> Syntax.Semantics G 0.4358 0.5186 0.840 0.4007 --- Signif.
> codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 262.51 on 384 degrees of freedom Residual
> deviance: 256.22 on 377 degrees of freedom Penalized deviance:
> 245.5554 AIC: 272.22
>
>
> MCMCglmm is too complex for me.
>
> Wolfgang, I tried the penalized likelihood method (logistf
> function) but output is hard to read:
>
>> trial<-logistf(Correct ~ Syntax.Semantics, data = trialglm,
>> family =
> binomial) Warning messages: 1: package ‘logistf’ was built under R
> version 3.1.2 2: package ‘mice’ was built under R version 3.1.2
>> summary(trial)
> logistf(formula = Correct ~ Syntax.Semantics, data = trialglm,
> family = binomial)
>
> Model fitted by Penalized ML Confidence intervals and p-values by
> Profile Likelihood Profile Likelihood Profile Likelihood Profile
> Likelihood Profile Likelihood Profile Likelihood Profile Likelihood
> Profile Likelihood
>
> coef se(coef) lower 0.95 upper 0.95 Chisq p (Intercept)
> 3.2094017 0.7724482 2.9648747 3.5127830 0.000000 1.000000e+00
> Syntax.Semantics A 4.1767737 6.3254344 0.4224696 12.0673987
> 64.224452 1.110223e-15 Syntax.Semantics B -1.0583602 0.8959376
> -1.3963977 -0.7625216 0.000000 1.000000e+00 Syntax.Semantics C
> -0.7299070 0.9308193 -1.0765598 -0.4180076 0.000000 1.000000e+00
> Syntax.Semantics D 0.2314740 1.1563731 -0.1704535 0.6479908
> 1.156512 2.821901e-01 Syntax.Semantics E -0.6476907 0.9771824
> -1.0076740 -0.3164066 0.000000 1.000000e+00 Syntax.Semantics F
> -0.8271499 0.9305931 -1.1743834 -0.5160799 0.000000 1.000000e+00
> Syntax.Semantics G 0.9909046 1.3787175 0.5457741 1.5353981
> 0.000000 1.000000e+00
>
> Likelihood ratio test=121.9841 on 7 df, p=0, n=385 Wald test =
> 5.334321 on 7 df, p = 0.6192356
>
> In particular, what is this model telling me? That Z (my ref level)
> and B are significantly different?
>
> I'm happy to try the elrm function with exact logistic regression
> but I am not capable of programming it. Besides, would it give me
> valid estimates for the comparison between the Z and B levels? The
> data frame should look like this:
>
> Outcome variable (Correct, incorrect) Predictor variable (A, B, C,
> D, E, F, G, Z) Counts (E: 38,3; B: 51,0; Z: 37,7; G: 40,12; D:
> 36,8; C:45,3; A: 34,13; F:65,22).
>
> Thank you! F.
>
> On Thu, May 28, 2015 at 2:28 AM, Ben Bolker <bbolker at gmail.com>
> wrote:
>
>> And for what it's worth, you can do this in conjunction with lme4
>> by using the blme package instead (a thin Bayesian wrapper around
>> lme4), or via the MCMCglmm package; see
>> http://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html
>> for an example (search for "complete separation").
>>
>> On Wed, May 27, 2015 at 5:21 PM, Viechtbauer Wolfgang (STAT)
>> <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
>>> You may need to consider using an 'exact', Bayesian, or
>>> penalized
>> likelihood approach (along the lines proposed by Firth).
>>>
>>> Maybe a place to start:
>> http://sas-and-r.blogspot.nl/2010/11/example-815-firth-logistic-regression.html
>>>
>>>
>>
Best,
>>> Wolfgang
>>>
>>>> -----Original Message----- From: R-sig-mixed-models
>>>> [mailto:r-sig-mixed-models-bounces at r- project.org] On Behalf
>>>> Of Francesco Romano Sent: Wednesday, May 27, 2015 23:00 To:
>>>> r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Zero
>>>> cells in contrast matrix problem
>>>>
>>>> After giving up on a glmer for my data, I remembered a post
>>>> by Roger
>> Levy
>>>> suggesting to try the use non mixed effects glm when one of
>>>> the cells in a matrix is zero.
>>>>
>>>> To put this into perspective:
>>>>
>>>>> trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name),
>>>>> data =
>>>> trialglm, family = binomial)
>>>>
>>>> Warning messages: 1: In checkConv(attr(opt, "derivs"),
>>>> opt$par, ctrl = control$checkConv, : Model failed to converge
>>>> with max|grad| = 0.053657 (tol = 0.001, component 4) 2: In
>>>> checkConv(attr(opt, "derivs"), opt$par, ctrl =
>>>> control$checkConv, : Model is nearly unidentifiable: large
>>>> eigenvalue ratio - Rescale variables?
>>>>
>>>> My data has a binary outcome, correct or incorrect, a fixed
>>>> effect predictor factor with 8 levels, and a random effect
>>>> for participants. I believe the problem R is encountering is
>>>> with one level of the factor (let us call it level B) which
>>>> has no counts (no I won' t try to post the table from the
>>>> paper with the counts because I know it will get garbled
>>>> up!).
>>>>
>>>> I attempt a glm with the same data:
>>>>
>>>>> trial<-glm(Correct ~ Syntax.Semantics, data = trialglm,
>>>>> family =
>>>> binomial)
>>>>> anova(trial)
>>>> Analysis of Deviance Table
>>>>
>>>> Model: binomial, link: logit
>>>>
>>>> Response: Correct
>>>>
>>>> Terms added sequentially (first to last)
>>>>
>>>>
>>>> Df Deviance Resid. Df Resid. Dev NULL
>>>> 384 289.63 Syntax.Semantics 7 34.651 377
>>>> 254.97
>>>>> summary(trial)
>>>>
>>>> Call: glm(formula = Correct ~ Syntax.Semantics, family =
>>>> binomial, data = trialglm)
>>>>
>>>> Deviance Residuals: Min 1Q Median 3Q
>>>> Max -0.79480 -0.62569 -0.34474 -0.00013 2.52113
>>>>
>>>> Coefficients: Estimate Std. Error z value Pr(>|z|)
>>>> (Intercept) -1.6917 0.4113 -4.113
>>>> 3.91e-05 *** Syntax.Semantics A 0.7013 0.5241 1.338
>>>> 0.1809 Syntax.Semantics B -16.8744 904.5273 -0.019
>>>> 0.9851 Syntax.Semantics C -1.1015 0.7231 -1.523
>>>> 0.1277 Syntax.Semantics D 0.1602 0.5667 0.283
>>>> 0.7774 Syntax.Semantics E -0.8733 0.7267 -1.202
>>>> 0.2295 Syntax.Semantics F -1.4438 0.8312 -1.737
>>>> 0.0824 . Syntax.Semantics G 0.4630 0.5262 0.880
>>>> 0.3789 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
>>>> ‘.’ 0.1 ‘ ’ 1
>>>>
>>>> (Dispersion parameter for binomial family taken to be 1)
>>>>
>>>> Null deviance: 289.63 on 384 degrees of freedom Residual
>>>> deviance: 254.98 on 377 degrees of freedom AIC: 270.98
>>>>
>>>> Number of Fisher Scoring iterations: 17
>>>>
>>>> The comparison I'm interested in is between level B and the
>>>> reference level but it cannot be estimated as shown by the
>>>> ridiculously high estimate and SE value.
>>>>
>>>> Any suggestions on how to get a decent beta, SE, z, and p?
>>>> It's the only comparison missing in the table for the levels
>>>> I need so I think it
>> would
>>>> be a bit unacademic of me to close this deal saying 'the
>>>> difference
>> could
>>>> not be estimated due to zero count'.
>>>>
>>>> And by the way I have seen this comparison being generated
>>>> using other stats.
>>>>
>>>> Thanks in advance,
>>>>
>>>> Frank
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAEBAgAGBQJVZ2DeAAoJEOCV5YRblxUH9f0IAN/LTzJllxXqmdP4U2bbDNOR
XnjYDsQ+cF6eR6aRMxWK1nj7Lgdi1pvqOU/3CSMVke2HW2Cr07wR2VDtqHwWRAgZ
jTlzlJ/iA5o32T1U2Wm9jrle0E0RpTMrA8SZ8HsGVKT3cD/5TNo9eoPAw3DV45AO
hmwUJf0NYLhZwOJ2QAsk1rAn06CBmrVSXFUmdGKpODELFJ4whAn95phE8pLY+aW9
qfO4Rq4FcZt1wdRwlZmk8woEeqeySb+rBRxZCVQ0HuyoEGONHMq5Wa1hnffwVR3V
yiIo1Vtd7sTbxAs96DeP8AItyHTvgsKRJphEK/PYguDQCGeR70sQEL53FTdHM60=
=3UD2
-----END PGP SIGNATURE-----
More information about the R-sig-mixed-models
mailing list