[R-sig-ME] Zero cells in contrast matrix problem

Thu May 28 20:39:26 CEST 2015

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 15-05-28 06:55 AM, Francesco Romano wrote:
> Many thanks to both.
> 
> The approaches you suggest (and others online) help one deal with
> the separation problem but don't offer any specific advice as to
> how getting a valid p coefficient when comparing two levels of the
> model vexed by separation.
> 
> Ben, here's the output of the bglmer which by the way would be
> ideal since it allows me to retain the random effect so that all my
> pairwise comparisons are conducted using mixed effects.
> 
>> trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name), data =
>> trialglm,
> family = binomial) Warning message: package ‘blme’ was built under
> R version 3.1.2
>> summary(trial)
> Cov prior  : Part.name ~ wishart(df = 3.5, scale = Inf,
> posterior.scale = cov, common.scale = TRUE) Prior dev  : 1.4371
> 
> Generalized linear mixed model fit by maximum likelihood (Laplace 
> Approximation) ['bglmerMod'] Family: binomial  ( logit ) Formula:
> Correct ~ Syntax.Semantics + (1 | Part.name) Data: trialglm
> 
> AIC      BIC   logLik deviance df.resid 269.9    305.5   -126.0
> 251.9      376
> 
> Scaled residuals: Min      1Q  Median      3Q     Max -0.9828
> -0.4281 -0.2445 -0.0002  5.7872
> 
> Random effects: Groups    Name        Variance Std.Dev. Part.name
> (Intercept) 0.3836   0.6194 Number of obs: 385, groups:  Part.name,
> 16
> 
> Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept)
> -1.8671     0.4538  -4.114 3.89e-05 *** Syntax.Semantics A
> 0.8121     0.5397   1.505   0.1324 Syntax.Semantics B  -16.4391
> 1195.5031  -0.014   0.9890 Syntax.Semantics C   -1.1323     0.7462
> -1.517   0.1292 Syntax.Semantics D    0.1789     0.5853   0.306
> 0.7598 Syntax.Semantics E   -0.8071     0.7500  -1.076   0.2819 
> Syntax.Semantics F   -1.5051     0.8575  -1.755   0.0792 . 
> Syntax.Semantics G    0.4395     0.5417   0.811   0.4171 --- 
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Unfortunately the separation problem is still there. Should I be 
> constraining the parameter somehow? How would I do that? The data
> is below.

   Did you read the section in the URL I suggested?  Just using bglmer
isn't enough; you also have to set a prior on the fixed effects.

  Your data don't seem to be attached (note that the mailing list
strips most non-ASCII file types).

> 
> In passing I also tried brglm which solves the separation problem
> but tells me comparison is significant which I don't believe one
> bit (see the data below). I am pretty sure about this because when
> I reveled and look at the comparisons I was able to compute using
> glmer, these turn out to be non-significant, when glmer told me
> they were:
> 
>> trial<-brglm(Correct ~ Syntax.Semantics, data = trialglm, family
>> =
> binomial) Warning messages: 1: package ‘elrm’ was built under R
> version 3.1.2 2: package ‘coda’ was built under R version 3.1.3
>> summary(trial)
> 
> Call: brglm(formula = Correct ~ Syntax.Semantics, family =
> binomial, data = trialglm)
> 
> 
> Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)
> -1.6358     0.4035  -4.053 5.05e-05 *** Syntax.Semantics A   0.6689
> 0.5169   1.294   0.1957 Syntax.Semantics B  -3.0182     1.4902
> -2.025   0.0428 * Syntax.Semantics C  -1.0135     0.6889  -1.471
> 0.1413 Syntax.Semantics D   0.1515     0.5571   0.272   0.7857 
> Syntax.Semantics E  -0.7878     0.6937  -1.136   0.2561 
> Syntax.Semantics F  -1.2874     0.7702  -1.672   0.0946 . 
> Syntax.Semantics G   0.4358     0.5186   0.840   0.4007 --- Signif.
> codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> (Dispersion parameter for binomial family taken to be 1)
> 
> Null deviance: 262.51  on 384  degrees of freedom Residual
> deviance: 256.22  on 377  degrees of freedom Penalized deviance:
> 245.5554 AIC:  272.22
> 
> 
> MCMCglmm is too complex for me.
> 
> Wolfgang, I tried the penalized likelihood method (logistf
> function) but output is hard to read:
> 
>> trial<-logistf(Correct ~ Syntax.Semantics, data = trialglm,
>> family =
> binomial) Warning messages: 1: package ‘logistf’ was built under R
> version 3.1.2 2: package ‘mice’ was built under R version 3.1.2
>> summary(trial)
> logistf(formula = Correct ~ Syntax.Semantics, data = trialglm, 
> family = binomial)
> 
> Model fitted by Penalized ML Confidence intervals and p-values by
> Profile Likelihood Profile Likelihood Profile Likelihood Profile
> Likelihood Profile Likelihood Profile Likelihood Profile Likelihood
> Profile Likelihood
> 
> coef  se(coef) lower 0.95 upper 0.95 Chisq            p (Intercept)
> 3.2094017 0.7724482  2.9648747  3.5127830 0.000000 1.000000e+00 
> Syntax.Semantics A  4.1767737 6.3254344  0.4224696 12.0673987
> 64.224452 1.110223e-15 Syntax.Semantics B -1.0583602 0.8959376
> -1.3963977 -0.7625216  0.000000 1.000000e+00 Syntax.Semantics C
> -0.7299070 0.9308193 -1.0765598 -0.4180076  0.000000 1.000000e+00 
> Syntax.Semantics D  0.2314740 1.1563731 -0.1704535  0.6479908
> 1.156512 2.821901e-01 Syntax.Semantics E -0.6476907 0.9771824
> -1.0076740 -0.3164066  0.000000 1.000000e+00 Syntax.Semantics F
> -0.8271499 0.9305931 -1.1743834 -0.5160799  0.000000 1.000000e+00 
> Syntax.Semantics G  0.9909046 1.3787175  0.5457741  1.5353981
> 0.000000 1.000000e+00
> 
> Likelihood ratio test=121.9841 on 7 df, p=0, n=385 Wald test =
> 5.334321 on 7 df, p = 0.6192356
> 
> In particular, what is this model telling me? That Z (my ref level)
> and B are significantly different?
> 
> I'm happy to try the elrm function with exact logistic regression
> but I am not capable of programming it. Besides, would it give me
> valid estimates for the comparison between the Z and B levels? The
> data frame should look like this:
> 
> Outcome variable (Correct, incorrect) Predictor variable (A, B, C,
> D, E, F, G, Z) Counts (E: 38,3; B: 51,0; Z: 37,7; G: 40,12; D:
> 36,8; C:45,3; A: 34,13; F:65,22).
> 
> Thank you! F.
> 
> On Thu, May 28, 2015 at 2:28 AM, Ben Bolker <bbolker at gmail.com>
> wrote:
> 
>> And for what it's worth, you can do this in conjunction with lme4
>> by using the blme package instead (a thin Bayesian wrapper around
>> lme4), or via the MCMCglmm package; see 
>> http://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html
>> for an example (search for "complete separation").
>> 
>> On Wed, May 27, 2015 at 5:21 PM, Viechtbauer Wolfgang (STAT) 
>> <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
>>> You may need to consider using an 'exact', Bayesian, or
>>> penalized
>> likelihood approach (along the lines proposed by Firth).
>>> 
>>> Maybe a place to start:
>> http://sas-and-r.blogspot.nl/2010/11/example-815-firth-logistic-regression.html
>>>
>>>
>> 
Best,
>>> Wolfgang
>>> 
>>>> -----Original Message----- From: R-sig-mixed-models
>>>> [mailto:r-sig-mixed-models-bounces at r- project.org] On Behalf
>>>> Of Francesco Romano Sent: Wednesday, May 27, 2015 23:00 To:
>>>> r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Zero
>>>> cells in contrast matrix problem
>>>> 
>>>> After giving up on a glmer for my data, I remembered a post
>>>> by Roger
>> Levy
>>>> suggesting to try the use non mixed effects glm when one of
>>>> the cells in a matrix is zero.
>>>> 
>>>> To put this into perspective:
>>>> 
>>>>> trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name),
>>>>> data =
>>>> trialglm, family = binomial)
>>>> 
>>>> Warning messages: 1: In checkConv(attr(opt, "derivs"),
>>>> opt$par, ctrl = control$checkConv, : Model failed to converge
>>>> with max|grad| = 0.053657 (tol = 0.001, component 4) 2: In
>>>> checkConv(attr(opt, "derivs"), opt$par, ctrl =
>>>> control$checkConv, : Model is nearly unidentifiable: large
>>>> eigenvalue ratio - Rescale variables?
>>>> 
>>>> My data has a binary outcome, correct or incorrect, a fixed
>>>> effect predictor factor with 8 levels, and a random effect
>>>> for participants. I believe the problem R is encountering is
>>>> with one level of the factor (let us call it level B) which
>>>> has no counts (no I won' t try to post the table from the
>>>> paper with the counts because I know it will get garbled
>>>> up!).
>>>> 
>>>> I attempt a glm with the same data:
>>>> 
>>>>> trial<-glm(Correct ~ Syntax.Semantics, data = trialglm,
>>>>> family =
>>>> binomial)
>>>>> anova(trial)
>>>> Analysis of Deviance Table
>>>> 
>>>> Model: binomial, link: logit
>>>> 
>>>> Response: Correct
>>>> 
>>>> Terms added sequentially (first to last)
>>>> 
>>>> 
>>>> Df Deviance Resid. Df Resid. Dev NULL
>>>> 384     289.63 Syntax.Semantics  7   34.651       377
>>>> 254.97
>>>>> summary(trial)
>>>> 
>>>> Call: glm(formula = Correct ~ Syntax.Semantics, family =
>>>> binomial, data = trialglm)
>>>> 
>>>> Deviance Residuals: Min        1Q    Median        3Q
>>>> Max -0.79480  -0.62569  -0.34474  -0.00013   2.52113
>>>> 
>>>> Coefficients: Estimate Std. Error z value Pr(>|z|) 
>>>> (Intercept)                 -1.6917     0.4113  -4.113
>>>> 3.91e-05 *** Syntax.Semantics A   0.7013     0.5241   1.338
>>>> 0.1809 Syntax.Semantics B -16.8744   904.5273  -0.019
>>>> 0.9851 Syntax.Semantics C  -1.1015     0.7231  -1.523
>>>> 0.1277 Syntax.Semantics D   0.1602     0.5667   0.283
>>>> 0.7774 Syntax.Semantics E  -0.8733     0.7267  -1.202
>>>> 0.2295 Syntax.Semantics F  -1.4438     0.8312  -1.737
>>>> 0.0824 . Syntax.Semantics G   0.4630     0.5262   0.880
>>>> 0.3789 --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
>>>> ‘.’ 0.1 ‘ ’ 1
>>>> 
>>>> (Dispersion parameter for binomial family taken to be 1)
>>>> 
>>>> Null deviance: 289.63  on 384  degrees of freedom Residual
>>>> deviance: 254.98  on 377  degrees of freedom AIC: 270.98
>>>> 
>>>> Number of Fisher Scoring iterations: 17
>>>> 
>>>> The comparison I'm interested in is between level B and the
>>>> reference level but it cannot be estimated as shown by the
>>>> ridiculously high estimate and SE value.
>>>> 
>>>> Any suggestions on how to get a decent beta, SE, z, and p?
>>>> It's the only comparison missing in the table for the levels
>>>> I need so I think it
>> would
>>>> be a bit unacademic of me to close this deal saying 'the
>>>> difference
>> could
>>>> not be estimated due to zero count'.
>>>> 
>>>> And by the way I have seen this comparison being generated
>>>> using other stats.
>>>> 
>>>> Thanks in advance,
>>>> 
>>>> Frank
>>>> 
>>>> [[alternative HTML version deleted]]
>>>> 
>>>> _______________________________________________ 
>>>> R-sig-mixed-models at r-project.org mailing list 
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> _______________________________________________ 
>>> R-sig-mixed-models at r-project.org mailing list 
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJVZ2DeAAoJEOCV5YRblxUH9f0IAN/LTzJllxXqmdP4U2bbDNOR
XnjYDsQ+cF6eR6aRMxWK1nj7Lgdi1pvqOU/3CSMVke2HW2Cr07wR2VDtqHwWRAgZ
jTlzlJ/iA5o32T1U2Wm9jrle0E0RpTMrA8SZ8HsGVKT3cD/5TNo9eoPAw3DV45AO
hmwUJf0NYLhZwOJ2QAsk1rAn06CBmrVSXFUmdGKpODELFJ4whAn95phE8pLY+aW9
qfO4Rq4FcZt1wdRwlZmk8woEeqeySb+rBRxZCVQ0HuyoEGONHMq5Wa1hnffwVR3V
yiIo1Vtd7sTbxAs96DeP8AItyHTvgsKRJphEK/PYguDQCGeR70sQEL53FTdHM60=
=3UD2
-----END PGP SIGNATURE-----