[R-sig-ME] Zero cells in contrast matrix problem

Mon Oct 26 11:56:48 CET 2015

I wonder if anyone can help with the separation problem originally solved
by Ben Bolker (see thread).
The model and fitting I used previously was

trial<-bglmer(Correct ~ Syntax.Semantics, data = trialglm, family =
binomial, fixef.prior = normal(cov = diag(9,4))

which now has to change because the Syntax.Semantcs factor needs to be
split into separate
within-subjects variables, Syntax, a factor with two levels, and Animacy, a
factor with four levels.
In addition a new between-subjects factor called Group with two levels
(native vs non-native speaker)
has to be added which determines the following model, fit by bglmer:

trial<-bglmer(Correct ~ Syntax*Animacy*Group+ (1|Part.name)+(1|Item), data
= trialglm, family = binomial,
fixef.prior = normal(cov = diag???)

What values should I use for the cov=diag portion in order to continue
attempting convergence of a model
that includes the random effects?

R returns the following error because I don't know how to establish the
parameters when more than one
fixed effect is involved:

Error in normal(cov = cov, common.scale = FALSE) :
  normal prior covariance of improper length

Many thanks in advance for any help!

On Thu, May 28, 2015 at 10:46 PM, Ben Bolker <bbolker at gmail.com> wrote:

>   I don't see your data -- I see a little tiny subset, but that's not
> really enough for a reproducible example.
>
> This is the example given in the URL I sent:
>
> cmod_blme_L2 <- bglmer(predation~ttt+(1|block),data=newdat,
>                        family=binomial,
>                        fixef.prior = normal(cov = diag(9,4)))
>
> trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name),
>           data =trialglm,
>          family = binomial,
>         fixef.prior = normal(cov=diag(9,8)))
>
> The last line specifies an 8x8 matrix (because you have 8 fixed effect
> parameters) with a value of 9 on the diagonal, meaning the priors for
> the fixed effects are independent and each is Normal with a sd of
> sqrt(9)=3.
>
>
> On Thu, May 28, 2015 at 3:25 PM, Francesco Romano
> <francescobryanromano at gmail.com> wrote:
> > Yes but this seems a bit above my head without your help. The data are in
> > the three variables at the bottom of my email but I forgot to mention the
> > random participant effect (n = 17). Thanks!
> >
> >
> > Il giovedì 28 maggio 2015, Ben Bolker <bbolker at gmail.com> ha scritto:
> >>
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> On 15-05-28 06:55 AM, Francesco Romano wrote:
> >> > Many thanks to both.
> >> >
> >> > The approaches you suggest (and others online) help one deal with
> >> > the separation problem but don't offer any specific advice as to
> >> > how getting a valid p coefficient when comparing two levels of the
> >> > model vexed by separation.
> >> >
> >> > Ben, here's the output of the bglmer which by the way would be
> >> > ideal since it allows me to retain the random effect so that all my
> >> > pairwise comparisons are conducted using mixed effects.
> >> >
> >> >> trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name), data =
> >> >> trialglm,
> >> > family = binomial) Warning message: package ‘blme’ was built under
> >> > R version 3.1.2
> >> >> summary(trial)
> >> > Cov prior  : Part.name ~ wishart(df = 3.5, scale = Inf,
> >> > posterior.scale = cov, common.scale = TRUE) Prior dev  : 1.4371
> >> >
> >> > Generalized linear mixed model fit by maximum likelihood (Laplace
> >> > Approximation) ['bglmerMod'] Family: binomial  ( logit ) Formula:
> >> > Correct ~ Syntax.Semantics + (1 | Part.name) Data: trialglm
> >> >
> >> > AIC      BIC   logLik deviance df.resid 269.9    305.5   -126.0
> >> > 251.9      376
> >> >
> >> > Scaled residuals: Min      1Q  Median      3Q     Max -0.9828
> >> > -0.4281 -0.2445 -0.0002  5.7872
> >> >
> >> > Random effects: Groups    Name        Variance Std.Dev. Part.name
> >> > (Intercept) 0.3836   0.6194 Number of obs: 385, groups:  Part.name,
> >> > 16
> >> >
> >> > Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept)
> >> > -1.8671     0.4538  -4.114 3.89e-05 *** Syntax.Semantics A
> >> > 0.8121     0.5397   1.505   0.1324 Syntax.Semantics B  -16.4391
> >> > 1195.5031  -0.014   0.9890 Syntax.Semantics C   -1.1323     0.7462
> >> > -1.517   0.1292 Syntax.Semantics D    0.1789     0.5853   0.306
> >> > 0.7598 Syntax.Semantics E   -0.8071     0.7500  -1.076   0.2819
> >> > Syntax.Semantics F   -1.5051     0.8575  -1.755   0.0792 .
> >> > Syntax.Semantics G    0.4395     0.5417   0.811   0.4171 ---
> >> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >> >
> >> > Unfortunately the separation problem is still there. Should I be
> >> > constraining the parameter somehow? How would I do that? The data
> >> > is below.
> >>
> >>    Did you read the section in the URL I suggested?  Just using bglmer
> >> isn't enough; you also have to set a prior on the fixed effects.
> >>
> >>   Your data don't seem to be attached (note that the mailing list
> >> strips most non-ASCII file types).
> >>
> >> >
> >> > In passing I also tried brglm which solves the separation problem
> >> > but tells me comparison is significant which I don't believe one
> >> > bit (see the data below). I am pretty sure about this because when
> >> > I reveled and look at the comparisons I was able to compute using
> >> > glmer, these turn out to be non-significant, when glmer told me
> >> > they were:
> >> >
> >> >> trial<-brglm(Correct ~ Syntax.Semantics, data = trialglm, family
> >> >> =
> >> > binomial) Warning messages: 1: package ‘elrm’ was built under R
> >> > version 3.1.2 2: package ‘coda’ was built under R version 3.1.3
> >> >> summary(trial)
> >> >
> >> > Call: brglm(formula = Correct ~ Syntax.Semantics, family =
> >> > binomial, data = trialglm)
> >> >
> >> >
> >> > Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)
> >> > -1.6358     0.4035  -4.053 5.05e-05 *** Syntax.Semantics A   0.6689
> >> > 0.5169   1.294   0.1957 Syntax.Semantics B  -3.0182     1.4902
> >> > -2.025   0.0428 * Syntax.Semantics C  -1.0135     0.6889  -1.471
> >> > 0.1413 Syntax.Semantics D   0.1515     0.5571   0.272   0.7857
> >> > Syntax.Semantics E  -0.7878     0.6937  -1.136   0.2561
> >> > Syntax.Semantics F  -1.2874     0.7702  -1.672   0.0946 .
> >> > Syntax.Semantics G   0.4358     0.5186   0.840   0.4007 --- Signif.
> >> > codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >> >
> >> > (Dispersion parameter for binomial family taken to be 1)
> >> >
> >> > Null deviance: 262.51  on 384  degrees of freedom Residual
> >> > deviance: 256.22  on 377  degrees of freedom Penalized deviance:
> >> > 245.5554 AIC:  272.22
> >> >
> >> >
> >> > MCMCglmm is too complex for me.
> >> >
> >> > Wolfgang, I tried the penalized likelihood method (logistf
> >> > function) but output is hard to read:
> >> >
> >> >> trial<-logistf(Correct ~ Syntax.Semantics, data = trialglm,
> >> >> family =
> >> > binomial) Warning messages: 1: package ‘logistf’ was built under R
> >> > version 3.1.2 2: package ‘mice’ was built under R version 3.1.2
> >> >> summary(trial)
> >> > logistf(formula = Correct ~ Syntax.Semantics, data = trialglm,
> >> > family = binomial)
> >> >
> >> > Model fitted by Penalized ML Confidence intervals and p-values by
> >> > Profile Likelihood Profile Likelihood Profile Likelihood Profile
> >> > Likelihood Profile Likelihood Profile Likelihood Profile Likelihood
> >> > Profile Likelihood
> >> >
> >> > coef  se(coef) lower 0.95 upper 0.95 Chisq            p (Intercept)
> >> > 3.2094017 0.7724482  2.9648747  3.5127830 0.000000 1.000000e+00
> >> > Syntax.Semantics A  4.1767737 6.3254344  0.4224696 12.0673987
> >> > 64.224452 1.110223e-15 Syntax.Semantics B -1.0583602 0.8959376
> >> > -1.3963977 -0.7625216  0.000000 1.000000e+00 Syntax.Semantics C
> >> > -0.7299070 0.9308193 -1.0765598 -0.4180076  0.000000 1.000000e+00
> >> > Syntax.Semantics D  0.2314740 1.1563731 -0.1704535  0.6479908
> >> > 1.156512 2.821901e-01 Syntax.Semantics E -0.6476907 0.9771824
> >> > -1.0076740 -0.3164066  0.000000 1.000000e+00 Syntax.Semantics F
> >> > -0.8271499 0.9305931 -1.1743834 -0.5160799  0.000000 1.000000e+00
> >> > Syntax.Semantics G  0.9909046 1.3787175  0.5457741  1.5353981
> >> > 0.000000 1.000000e+00
> >> >
> >> > Likelihood ratio test=121.9841 on 7 df, p=0, n=385 Wald test =
> >> > 5.334321 on 7 df, p = 0.6192356
> >> >
> >> > In particular, what is this model telling me? That Z (my ref level)
> >> > and B are significantly different?
> >> >
> >> > I'm happy to try the elrm function with exact logistic regression
> >> > but I am not capable of programming it. Besides, would it give me
> >> > valid estimates for the comparison between the Z and B levels? The
> >> > data frame should look like this:
> >> >
> >> > Outcome variable (Correct, incorrect) Predictor variable (A, B, C,
> >> > D, E, F, G, Z) Counts (E: 38,3; B: 51,0; Z: 37,7; G: 40,12; D:
> >> > 36,8; C:45,3; A: 34,13; F:65,22).
> >> >
> >> > Thank you! F.
> >> >
> >> > On Thu, May 28, 2015 at 2:28 AM, Ben Bolker <bbolker at gmail.com>
> >> > wrote:
> >> >
> >> >> And for what it's worth, you can do this in conjunction with lme4
> >> >> by using the blme package instead (a thin Bayesian wrapper around
> >> >> lme4), or via the MCMCglmm package; see
> >> >> http://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html
> >> >> for an example (search for "complete separation").
> >> >>
> >> >> On Wed, May 27, 2015 at 5:21 PM, Viechtbauer Wolfgang (STAT)
> >> >> <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
> >> >>> You may need to consider using an 'exact', Bayesian, or
> >> >>> penalized
> >> >> likelihood approach (along the lines proposed by Firth).
> >> >>>
> >> >>> Maybe a place to start:
> >> >>
> >> >>
> http://sas-and-r.blogspot.nl/2010/11/example-815-firth-logistic-regression.html
> >> >>>
> >> >>>
> >> >>
> >> Best,
> >> >>> Wolfgang
> >> >>>
> >> >>>> -----Original Message----- From: R-sig-mixed-models
> >> >>>> [mailto:r-sig-mixed-models-bounces at r- project.org] On Behalf
> >> >>>> Of Francesco Romano Sent: Wednesday, May 27, 2015 23:00 To:
> >> >>>> r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Zero
> >> >>>> cells in contrast matrix problem
> >> >>>>
> >> >>>> After giving up on a glmer for my data, I remembered a post
> >> >>>> by Roger
> >> >> Levy
> >> >>>> suggesting to try the use non mixed effects glm when one of
> >> >>>> the cells in a matrix is zero.
> >> >>>>
> >> >>>> To put this into perspective:
> >> >>>>
> >> >>>>> trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name),
> >> >>>>> data =
> >> >>>> trialglm, family = binomial)
> >> >>>>
> >> >>>> Warning messages: 1: In checkConv(attr(opt, "derivs"),
> >> >>>> opt$par, ctrl = control$checkConv, : Model failed to converge
> >> >>>> with max|grad| = 0.053657 (tol = 0.001, component 4) 2: In
> >> >>>> checkConv(attr(opt, "derivs"), opt$par, ctrl =
> >> >>>> control$checkConv, : Model is nearly unidentifiable: large
> >> >>>> eigenvalue ratio - Rescale variables?
> >> >>>>
> >> >>>> My data has a binary outcome, correct or incorrect, a fixed
> >> >>>> effect predictor factor with 8 levels, and a random effect
> >> >>>> for participants. I believe the problem R is encountering is
> >> >>>> with one level of the factor (let us call it level B) which
> >> >>>> has no counts (no I won' t try to post the table from the
> >> >>>> paper with the counts because I know it will get garbled
> >> >>>> up!).
> >> >>>>
> >> >>>> I attempt a glm with the same data:
> >> >>>>
> >> >>>>> trial<-glm(Correct ~ Syntax.Semantics, data = trialglm,
> >> >>>>> family =
> >> >>>> binomial)
> >> >>>>> anova(trial)
> >> >>>> Analysis of Deviance Table
> >> >>>>
> >> >>>> Model: binomial, link: logit
> >> >>>>
> >> >>>> Response: Correct
> >> >>>>
> >> >>>> Terms added sequentially (first to last)
> >> >>>>
> >> >>>>
> >> >>>> Df Deviance Resid. Df Resid. Dev NULL
> >> >>>> 384     289.63 Syntax.Semantics  7   34.651       377
> >> >>>> 254.97
> >> >>>>> summary(trial)
> >> >>>>
> >> >>>> Call: glm(formula = Correct ~ Syntax.Semantics, family =
> >> >>>> binomial, data = trialglm)
> >> >>>>
> >> >>>> Deviance Residuals: Min        1Q    Median        3Q
> >> >>>> Max -0.79480  -0.62569  -0.34474  -0.00013   2.52113
> >> >>>>
> >> >>>> Coefficients: Estimate Std. Error z value Pr(>|z|)
> >> >>>> (Intercept)                 -1.6917     0.4113  -4.113
> >> >>>> 3.91e-05 *** Syntax.Semantics A   0.7013     0.5241   1.338
> >> >>>> 0.1809 Syntax.Semantics B -16.8744   904.5273  -0.019
> >> >>>> 0.9851 Syntax.Semantics C  -1.1015     0.7231  -1.523
> >> >>>> 0.1277 Syntax.Semantics D   0.1602     0.5667   0.283
> >> >>>> 0.7774 Syntax.Semantics E  -0.8733     0.7267  -1.202
> >> >>>> 0.2295 Syntax.Semantics F  -1.4438     0.8312  -1.737
> >> >>>> 0.0824 . Syntax.Semantics G   0.4630     0.5262   0.880
> >> >>>> 0.3789 --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
> >> >>>> ‘.’ 0.1 ‘ ’ 1
> >> >>>>
> >> >>>> (Dispersion parameter for binomial family taken to be 1)
> >> >>>>
> >> >>>> Null deviance: 289.63  on 384  degrees of freedom Residual
> >> >>>> deviance: 254.98  on 377  degrees of freedom AIC: 270.98
> >> >>>>
> >> >>>> Number of Fisher Scoring iterations: 17
> >> >>>>
> >> >>>> The comparison I'm interested in is between level B and the
> >> >>>> reference level but it cannot be estimated as shown by the
> >> >>>> ridiculously high estimate and SE value.
> >> >>>>
> >> >>>> Any suggestions on how to get a decent beta, SE, z, and p?
> >> >>>> It's the only comparison missing in the table for the levels
> >> >>>> I need so I think it
> >> >> would
> >> >>>> be a bit unacademic of me to close this deal saying 'the
> >> >>>> difference
> >> >> could
> >> >>>> not be estimated due to zero count'.
> >> >>>>
> >> >>>> And by the way I have seen this comparison being generated
> >> >>>> using other stats.
> >> >>>>
> >> >>>> Thanks in advance,
> >> >>>>
> >> >>>> Frank
> >> >>>>
> >> >>>> [[alternative HTML version deleted]]
> >> >>>>
> >> >>>> _______________________________________________
> >> >>>> R-sig-mixed-models at r-project.org mailing list
> >> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >> >>> _______________________________________________
> >> >>> R-sig-mixed-models at r-project.org mailing list
> >> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >> >>
> >> >
> >>
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG v1.4.11 (GNU/Linux)
> >>
> >> iQEcBAEBAgAGBQJVZ2DeAAoJEOCV5YRblxUH9f0IAN/LTzJllxXqmdP4U2bbDNOR
> >> XnjYDsQ+cF6eR6aRMxWK1nj7Lgdi1pvqOU/3CSMVke2HW2Cr07wR2VDtqHwWRAgZ
> >> jTlzlJ/iA5o32T1U2Wm9jrle0E0RpTMrA8SZ8HsGVKT3cD/5TNo9eoPAw3DV45AO
> >> hmwUJf0NYLhZwOJ2QAsk1rAn06CBmrVSXFUmdGKpODELFJ4whAn95phE8pLY+aW9
> >> qfO4Rq4FcZt1wdRwlZmk8woEeqeySb+rBRxZCVQ0HuyoEGONHMq5Wa1hnffwVR3V
> >> yiIo1Vtd7sTbxAs96DeP8AItyHTvgsKRJphEK/PYguDQCGeR70sQEL53FTdHM60=
> >> =3UD2
> >> -----END PGP SIGNATURE-----
> >
> >
> >
> > --
> > Sent from Gmail for IPhone
>

-- 
Frank Romano Ph.D.

Tel. +39 3911639149

*LinkedIn*
https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162

*Academia.edu*
https://sheffield.academia.edu/FrancescoRomano

	[[alternative HTML version deleted]]