[R-sig-ME] Zero cells in contrast matrix problem
Ben Bolker
bbolker at gmail.com
Mon Oct 26 12:06:33 CET 2015
On 15-10-26 06:56 AM, Francesco Romano wrote:
> I wonder if anyone can help with the separation problem originally solved
> by Ben Bolker (see thread).
> The model and fitting I used previously was
>
> trial<-bglmer(Correct ~ Syntax.Semantics, data = trialglm, family =
> binomial, fixef.prior = normal(cov = diag(9,4))
>
> which now has to change because the Syntax.Semantcs factor needs to be
> split into separate
> within-subjects variables, Syntax, a factor with two levels, and Animacy, a
> factor with four levels.
> In addition a new between-subjects factor called Group with two levels
> (native vs non-native speaker)
> has to be added which determines the following model, fit by bglmer:
>
> trial<-bglmer(Correct ~ Syntax*Animacy*Group+ (1|Part.name)+(1|Item), data
> = trialglm, family = binomial,
> fixef.prior = normal(cov = diag???)
>
> What values should I use for the cov=diag portion in order to continue
> attempting convergence of a model
> that includes the random effects?
In general a reasonable form is normal(cov = diag(v,np)) where v is
the prior variance (generally something reasonably
large/non-informative; 9 (=std dev of 3) is probably an OK default) and
np is the number of fixed-effect parameters. You can figure this out via
ncol(model.matrix(~Syntax*Animacy*Group,data=trialglm)
or multiply 2*4*2 to get 16 ...
>
> R returns the following error because I don't know how to establish the
> parameters when more than one
> fixed effect is involved:
>
> Error in normal(cov = cov, common.scale = FALSE) :
> normal prior covariance of improper length
>
> Many thanks in advance for any help!
>
>
>
>
>
> On Thu, May 28, 2015 at 10:46 PM, Ben Bolker <bbolker at gmail.com> wrote:
>
>> I don't see your data -- I see a little tiny subset, but that's not
>> really enough for a reproducible example.
>>
>> This is the example given in the URL I sent:
>>
>> cmod_blme_L2 <- bglmer(predation~ttt+(1|block),data=newdat,
>> family=binomial,
>> fixef.prior = normal(cov = diag(9,4)))
>>
>> trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name),
>> data =trialglm,
>> family = binomial,
>> fixef.prior = normal(cov=diag(9,8)))
>>
>> The last line specifies an 8x8 matrix (because you have 8 fixed effect
>> parameters) with a value of 9 on the diagonal, meaning the priors for
>> the fixed effects are independent and each is Normal with a sd of
>> sqrt(9)=3.
>>
>>
>> On Thu, May 28, 2015 at 3:25 PM, Francesco Romano
>> <francescobryanromano at gmail.com> wrote:
>>> Yes but this seems a bit above my head without your help. The data are in
>>> the three variables at the bottom of my email but I forgot to mention the
>>> random participant effect (n = 17). Thanks!
>>>
>>>
>>> Il giovedì 28 maggio 2015, Ben Bolker <bbolker at gmail.com> ha scritto:
>>>>
> On 15-05-28 06:55 AM, Francesco Romano wrote:
>>>>>> Many thanks to both.
>>>>>>
>>>>>> The approaches you suggest (and others online) help one deal with
>>>>>> the separation problem but don't offer any specific advice as to
>>>>>> how getting a valid p coefficient when comparing two levels of the
>>>>>> model vexed by separation.
>>>>>>
>>>>>> Ben, here's the output of the bglmer which by the way would be
>>>>>> ideal since it allows me to retain the random effect so that all my
>>>>>> pairwise comparisons are conducted using mixed effects.
>>>>>>
>>>>>>> trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name), data =
>>>>>>> trialglm,
>>>>>> family = binomial) Warning message: package ‘blme’ was built under
>>>>>> R version 3.1.2
>>>>>>> summary(trial)
>>>>>> Cov prior : Part.name ~ wishart(df = 3.5, scale = Inf,
>>>>>> posterior.scale = cov, common.scale = TRUE) Prior dev : 1.4371
>>>>>>
>>>>>> Generalized linear mixed model fit by maximum likelihood (Laplace
>>>>>> Approximation) ['bglmerMod'] Family: binomial ( logit ) Formula:
>>>>>> Correct ~ Syntax.Semantics + (1 | Part.name) Data: trialglm
>>>>>>
>>>>>> AIC BIC logLik deviance df.resid 269.9 305.5 -126.0
>>>>>> 251.9 376
>>>>>>
>>>>>> Scaled residuals: Min 1Q Median 3Q Max -0.9828
>>>>>> -0.4281 -0.2445 -0.0002 5.7872
>>>>>>
>>>>>> Random effects: Groups Name Variance Std.Dev. Part.name
>>>>>> (Intercept) 0.3836 0.6194 Number of obs: 385, groups: Part.name,
>>>>>> 16
>>>>>>
>>>>>> Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept)
>>>>>> -1.8671 0.4538 -4.114 3.89e-05 *** Syntax.Semantics A
>>>>>> 0.8121 0.5397 1.505 0.1324 Syntax.Semantics B -16.4391
>>>>>> 1195.5031 -0.014 0.9890 Syntax.Semantics C -1.1323 0.7462
>>>>>> -1.517 0.1292 Syntax.Semantics D 0.1789 0.5853 0.306
>>>>>> 0.7598 Syntax.Semantics E -0.8071 0.7500 -1.076 0.2819
>>>>>> Syntax.Semantics F -1.5051 0.8575 -1.755 0.0792 .
>>>>>> Syntax.Semantics G 0.4395 0.5417 0.811 0.4171 ---
>>>>>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>>>>
>>>>>> Unfortunately the separation problem is still there. Should I be
>>>>>> constraining the parameter somehow? How would I do that? The data
>>>>>> is below.
>
> Did you read the section in the URL I suggested? Just using bglmer
> isn't enough; you also have to set a prior on the fixed effects.
>
> Your data don't seem to be attached (note that the mailing list
> strips most non-ASCII file types).
>
>>>>>>
>>>>>> In passing I also tried brglm which solves the separation problem
>>>>>> but tells me comparison is significant which I don't believe one
>>>>>> bit (see the data below). I am pretty sure about this because when
>>>>>> I reveled and look at the comparisons I was able to compute using
>>>>>> glmer, these turn out to be non-significant, when glmer told me
>>>>>> they were:
>>>>>>
>>>>>>> trial<-brglm(Correct ~ Syntax.Semantics, data = trialglm, family
>>>>>>> =
>>>>>> binomial) Warning messages: 1: package ‘elrm’ was built under R
>>>>>> version 3.1.2 2: package ‘coda’ was built under R version 3.1.3
>>>>>>> summary(trial)
>>>>>>
>>>>>> Call: brglm(formula = Correct ~ Syntax.Semantics, family =
>>>>>> binomial, data = trialglm)
>>>>>>
>>>>>>
>>>>>> Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)
>>>>>> -1.6358 0.4035 -4.053 5.05e-05 *** Syntax.Semantics A 0.6689
>>>>>> 0.5169 1.294 0.1957 Syntax.Semantics B -3.0182 1.4902
>>>>>> -2.025 0.0428 * Syntax.Semantics C -1.0135 0.6889 -1.471
>>>>>> 0.1413 Syntax.Semantics D 0.1515 0.5571 0.272 0.7857
>>>>>> Syntax.Semantics E -0.7878 0.6937 -1.136 0.2561
>>>>>> Syntax.Semantics F -1.2874 0.7702 -1.672 0.0946 .
>>>>>> Syntax.Semantics G 0.4358 0.5186 0.840 0.4007 --- Signif.
>>>>>> codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>>>>
>>>>>> (Dispersion parameter for binomial family taken to be 1)
>>>>>>
>>>>>> Null deviance: 262.51 on 384 degrees of freedom Residual
>>>>>> deviance: 256.22 on 377 degrees of freedom Penalized deviance:
>>>>>> 245.5554 AIC: 272.22
>>>>>>
>>>>>>
>>>>>> MCMCglmm is too complex for me.
>>>>>>
>>>>>> Wolfgang, I tried the penalized likelihood method (logistf
>>>>>> function) but output is hard to read:
>>>>>>
>>>>>>> trial<-logistf(Correct ~ Syntax.Semantics, data = trialglm,
>>>>>>> family =
>>>>>> binomial) Warning messages: 1: package ‘logistf’ was built under R
>>>>>> version 3.1.2 2: package ‘mice’ was built under R version 3.1.2
>>>>>>> summary(trial)
>>>>>> logistf(formula = Correct ~ Syntax.Semantics, data = trialglm,
>>>>>> family = binomial)
>>>>>>
>>>>>> Model fitted by Penalized ML Confidence intervals and p-values by
>>>>>> Profile Likelihood Profile Likelihood Profile Likelihood Profile
>>>>>> Likelihood Profile Likelihood Profile Likelihood Profile Likelihood
>>>>>> Profile Likelihood
>>>>>>
>>>>>> coef se(coef) lower 0.95 upper 0.95 Chisq p (Intercept)
>>>>>> 3.2094017 0.7724482 2.9648747 3.5127830 0.000000 1.000000e+00
>>>>>> Syntax.Semantics A 4.1767737 6.3254344 0.4224696 12.0673987
>>>>>> 64.224452 1.110223e-15 Syntax.Semantics B -1.0583602 0.8959376
>>>>>> -1.3963977 -0.7625216 0.000000 1.000000e+00 Syntax.Semantics C
>>>>>> -0.7299070 0.9308193 -1.0765598 -0.4180076 0.000000 1.000000e+00
>>>>>> Syntax.Semantics D 0.2314740 1.1563731 -0.1704535 0.6479908
>>>>>> 1.156512 2.821901e-01 Syntax.Semantics E -0.6476907 0.9771824
>>>>>> -1.0076740 -0.3164066 0.000000 1.000000e+00 Syntax.Semantics F
>>>>>> -0.8271499 0.9305931 -1.1743834 -0.5160799 0.000000 1.000000e+00
>>>>>> Syntax.Semantics G 0.9909046 1.3787175 0.5457741 1.5353981
>>>>>> 0.000000 1.000000e+00
>>>>>>
>>>>>> Likelihood ratio test=121.9841 on 7 df, p=0, n=385 Wald test =
>>>>>> 5.334321 on 7 df, p = 0.6192356
>>>>>>
>>>>>> In particular, what is this model telling me? That Z (my ref level)
>>>>>> and B are significantly different?
>>>>>>
>>>>>> I'm happy to try the elrm function with exact logistic regression
>>>>>> but I am not capable of programming it. Besides, would it give me
>>>>>> valid estimates for the comparison between the Z and B levels? The
>>>>>> data frame should look like this:
>>>>>>
>>>>>> Outcome variable (Correct, incorrect) Predictor variable (A, B, C,
>>>>>> D, E, F, G, Z) Counts (E: 38,3; B: 51,0; Z: 37,7; G: 40,12; D:
>>>>>> 36,8; C:45,3; A: 34,13; F:65,22).
>>>>>>
>>>>>> Thank you! F.
>>>>>>
>>>>>> On Thu, May 28, 2015 at 2:28 AM, Ben Bolker <bbolker at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> And for what it's worth, you can do this in conjunction with lme4
>>>>>>> by using the blme package instead (a thin Bayesian wrapper around
>>>>>>> lme4), or via the MCMCglmm package; see
>>>>>>> http://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html
>>>>>>> for an example (search for "complete separation").
>>>>>>>
>>>>>>> On Wed, May 27, 2015 at 5:21 PM, Viechtbauer Wolfgang (STAT)
>>>>>>> <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
>>>>>>>> You may need to consider using an 'exact', Bayesian, or
>>>>>>>> penalized
>>>>>>> likelihood approach (along the lines proposed by Firth).
>>>>>>>>
>>>>>>>> Maybe a place to start:
>>>>>>>
>>>>>>>
>>> http://sas-and-r.blogspot.nl/2010/11/example-815-firth-logistic-regression.html
>>>>>>>>
>>>>>>>>
>>>>>>>
> Best,
>>>>>>>> Wolfgang
>>>>>>>>
>>>>>>>>> -----Original Message----- From: R-sig-mixed-models
>>>>>>>>> [mailto:r-sig-mixed-models-bounces at r- project.org] On Behalf
>>>>>>>>> Of Francesco Romano Sent: Wednesday, May 27, 2015 23:00 To:
>>>>>>>>> r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Zero
>>>>>>>>> cells in contrast matrix problem
>>>>>>>>>
>>>>>>>>> After giving up on a glmer for my data, I remembered a post
>>>>>>>>> by Roger
>>>>>>> Levy
>>>>>>>>> suggesting to try the use non mixed effects glm when one of
>>>>>>>>> the cells in a matrix is zero.
>>>>>>>>>
>>>>>>>>> To put this into perspective:
>>>>>>>>>
>>>>>>>>>> trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name),
>>>>>>>>>> data =
>>>>>>>>> trialglm, family = binomial)
>>>>>>>>>
>>>>>>>>> Warning messages: 1: In checkConv(attr(opt, "derivs"),
>>>>>>>>> opt$par, ctrl = control$checkConv, : Model failed to converge
>>>>>>>>> with max|grad| = 0.053657 (tol = 0.001, component 4) 2: In
>>>>>>>>> checkConv(attr(opt, "derivs"), opt$par, ctrl =
>>>>>>>>> control$checkConv, : Model is nearly unidentifiable: large
>>>>>>>>> eigenvalue ratio - Rescale variables?
>>>>>>>>>
>>>>>>>>> My data has a binary outcome, correct or incorrect, a fixed
>>>>>>>>> effect predictor factor with 8 levels, and a random effect
>>>>>>>>> for participants. I believe the problem R is encountering is
>>>>>>>>> with one level of the factor (let us call it level B) which
>>>>>>>>> has no counts (no I won' t try to post the table from the
>>>>>>>>> paper with the counts because I know it will get garbled
>>>>>>>>> up!).
>>>>>>>>>
>>>>>>>>> I attempt a glm with the same data:
>>>>>>>>>
>>>>>>>>>> trial<-glm(Correct ~ Syntax.Semantics, data = trialglm,
>>>>>>>>>> family =
>>>>>>>>> binomial)
>>>>>>>>>> anova(trial)
>>>>>>>>> Analysis of Deviance Table
>>>>>>>>>
>>>>>>>>> Model: binomial, link: logit
>>>>>>>>>
>>>>>>>>> Response: Correct
>>>>>>>>>
>>>>>>>>> Terms added sequentially (first to last)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Df Deviance Resid. Df Resid. Dev NULL
>>>>>>>>> 384 289.63 Syntax.Semantics 7 34.651 377
>>>>>>>>> 254.97
>>>>>>>>>> summary(trial)
>>>>>>>>>
>>>>>>>>> Call: glm(formula = Correct ~ Syntax.Semantics, family =
>>>>>>>>> binomial, data = trialglm)
>>>>>>>>>
>>>>>>>>> Deviance Residuals: Min 1Q Median 3Q
>>>>>>>>> Max -0.79480 -0.62569 -0.34474 -0.00013 2.52113
>>>>>>>>>
>>>>>>>>> Coefficients: Estimate Std. Error z value Pr(>|z|)
>>>>>>>>> (Intercept) -1.6917 0.4113 -4.113
>>>>>>>>> 3.91e-05 *** Syntax.Semantics A 0.7013 0.5241 1.338
>>>>>>>>> 0.1809 Syntax.Semantics B -16.8744 904.5273 -0.019
>>>>>>>>> 0.9851 Syntax.Semantics C -1.1015 0.7231 -1.523
>>>>>>>>> 0.1277 Syntax.Semantics D 0.1602 0.5667 0.283
>>>>>>>>> 0.7774 Syntax.Semantics E -0.8733 0.7267 -1.202
>>>>>>>>> 0.2295 Syntax.Semantics F -1.4438 0.8312 -1.737
>>>>>>>>> 0.0824 . Syntax.Semantics G 0.4630 0.5262 0.880
>>>>>>>>> 0.3789 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
>>>>>>>>> ‘.’ 0.1 ‘ ’ 1
>>>>>>>>>
>>>>>>>>> (Dispersion parameter for binomial family taken to be 1)
>>>>>>>>>
>>>>>>>>> Null deviance: 289.63 on 384 degrees of freedom Residual
>>>>>>>>> deviance: 254.98 on 377 degrees of freedom AIC: 270.98
>>>>>>>>>
>>>>>>>>> Number of Fisher Scoring iterations: 17
>>>>>>>>>
>>>>>>>>> The comparison I'm interested in is between level B and the
>>>>>>>>> reference level but it cannot be estimated as shown by the
>>>>>>>>> ridiculously high estimate and SE value.
>>>>>>>>>
>>>>>>>>> Any suggestions on how to get a decent beta, SE, z, and p?
>>>>>>>>> It's the only comparison missing in the table for the levels
>>>>>>>>> I need so I think it
>>>>>>> would
>>>>>>>>> be a bit unacademic of me to close this deal saying 'the
>>>>>>>>> difference
>>>>>>> could
>>>>>>>>> not be estimated due to zero count'.
>>>>>>>>>
>>>>>>>>> And by the way I have seen this comparison being generated
>>>>>>>>> using other stats.
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> Frank
>>>>>>>>>
>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>> _______________________________________________
>>>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>
>>>>>>
>
>>>
>>>
>>>
>>> --
>>> Sent from Gmail for IPhone
>>
>
>
>
More information about the R-sig-mixed-models
mailing list