[R-sig-ME] Zero cells in contrast matrix problem

Francesco Romano francescobryanromano at gmail.com
Mon Oct 26 12:18:02 CET 2015


For some reason the silly bugger didn't past the full command:

> revanaA<-
bglmer(Correct~Syntax*Animacy*Prof.group.2+(1|Part.name)+(1|Item), data =
revana, family = binomial, fixef.prior = normal(cov = diag(9,16)))
fixed-effect model matrix is rank deficient so dropping 2 columns /
coefficients
Error in normal(cov = cov, common.scale = FALSE) :
  normal prior covariance of improper length

To give more info on this, it is the Animacy factor that is causing
separation because two levels of it have zero counts in some cases.

On Mon, Oct 26, 2015 at 12:13 PM, Ben Bolker <bbolker at gmail.com> wrote:

> Well, that's a separate problem (and not necessarily a "problem").   R
> is telling you that you have 16 separate combinations of the factors,
> but only 14 unique combinations represented in your data set, so it
> can only estimate 14 parameters.  Unless there is a weird interaction
> with blme I don't know about, this should still give you reasonable
> answers.
>
> On Mon, Oct 26, 2015 at 7:10 AM, Francesco Romano
> <francescobryanromano at gmail.com> wrote:
> > Many thanks Ben,
> >
> > but I tried that already:
> >
> >> revanaA<-
> >> bglmer(Correct~Syntax*Animacy*Prof.group.2+(1|Part.name)+(1|Item), data
> =
> >> revana, family = binomial, fixef.prior = normal(cov = diag(9,16)))
> > fixed-effect model matrix is rank deficient so dropping 2 columns /
> > coefficients
> > Error in normal(cov = cov, common.scale = FALSE) :
> >   normal prior covariance of improper length
> >
> > On Mon, Oct 26, 2015 at 12:06 PM, Ben Bolker <bbolker at gmail.com> wrote:
> >>
> >> On 15-10-26 06:56 AM, Francesco Romano wrote:
> >> > I wonder if anyone can help with the separation problem originally
> >> > solved
> >> > by Ben Bolker (see thread).
> >> > The model and fitting I used previously was
> >> >
> >> > trial<-bglmer(Correct ~ Syntax.Semantics, data = trialglm, family =
> >> > binomial, fixef.prior = normal(cov = diag(9,4))
> >> >
> >> > which now has to change because the Syntax.Semantcs factor needs to be
> >> > split into separate
> >> > within-subjects variables, Syntax, a factor with two levels, and
> >> > Animacy, a
> >> > factor with four levels.
> >> > In addition a new between-subjects factor called Group with two levels
> >> > (native vs non-native speaker)
> >> > has to be added which determines the following model, fit by bglmer:
> >> >
> >> > trial<-bglmer(Correct ~ Syntax*Animacy*Group+ (1|Part.name)+(1|Item),
> >> > data
> >> > = trialglm, family = binomial,
> >> > fixef.prior = normal(cov = diag???)
> >> >
> >> > What values should I use for the cov=diag portion in order to continue
> >> > attempting convergence of a model
> >> > that includes the random effects?
> >>
> >>     In general a reasonable form is normal(cov = diag(v,np)) where v is
> >> the prior variance (generally something reasonably
> >> large/non-informative; 9 (=std dev of 3) is probably an OK default) and
> >> np is the number of fixed-effect parameters.  You can figure this out
> via
> >>
> >> ncol(model.matrix(~Syntax*Animacy*Group,data=trialglm)
> >>
> >> or multiply 2*4*2 to get 16 ...
> >>
> >> >
> >> > R returns the following error because I don't know how to establish
> the
> >> > parameters when more than one
> >> > fixed effect is involved:
> >> >
> >> > Error in normal(cov = cov, common.scale = FALSE) :
> >> >   normal prior covariance of improper length
> >> >
> >> > Many thanks in advance for any help!
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Thu, May 28, 2015 at 10:46 PM, Ben Bolker <bbolker at gmail.com>
> wrote:
> >> >
> >> >>   I don't see your data -- I see a little tiny subset, but that's not
> >> >> really enough for a reproducible example.
> >> >>
> >> >> This is the example given in the URL I sent:
> >> >>
> >> >> cmod_blme_L2 <- bglmer(predation~ttt+(1|block),data=newdat,
> >> >>                        family=binomial,
> >> >>                        fixef.prior = normal(cov = diag(9,4)))
> >> >>
> >> >> trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name),
> >> >>           data =trialglm,
> >> >>          family = binomial,
> >> >>         fixef.prior = normal(cov=diag(9,8)))
> >> >>
> >> >> The last line specifies an 8x8 matrix (because you have 8 fixed
> effect
> >> >> parameters) with a value of 9 on the diagonal, meaning the priors for
> >> >> the fixed effects are independent and each is Normal with a sd of
> >> >> sqrt(9)=3.
> >> >>
> >> >>
> >> >> On Thu, May 28, 2015 at 3:25 PM, Francesco Romano
> >> >> <francescobryanromano at gmail.com> wrote:
> >> >>> Yes but this seems a bit above my head without your help. The data
> are
> >> >>> in
> >> >>> the three variables at the bottom of my email but I forgot to
> mention
> >> >>> the
> >> >>> random participant effect (n = 17). Thanks!
> >> >>>
> >> >>>
> >> >>> Il giovedì 28 maggio 2015, Ben Bolker <bbolker at gmail.com> ha
> scritto:
> >> >>>>
> >> > On 15-05-28 06:55 AM, Francesco Romano wrote:
> >> >>>>>> Many thanks to both.
> >> >>>>>>
> >> >>>>>> The approaches you suggest (and others online) help one deal with
> >> >>>>>> the separation problem but don't offer any specific advice as to
> >> >>>>>> how getting a valid p coefficient when comparing two levels of
> the
> >> >>>>>> model vexed by separation.
> >> >>>>>>
> >> >>>>>> Ben, here's the output of the bglmer which by the way would be
> >> >>>>>> ideal since it allows me to retain the random effect so that all
> my
> >> >>>>>> pairwise comparisons are conducted using mixed effects.
> >> >>>>>>
> >> >>>>>>> trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name), data =
> >> >>>>>>> trialglm,
> >> >>>>>> family = binomial) Warning message: package ‘blme’ was built
> under
> >> >>>>>> R version 3.1.2
> >> >>>>>>> summary(trial)
> >> >>>>>> Cov prior  : Part.name ~ wishart(df = 3.5, scale = Inf,
> >> >>>>>> posterior.scale = cov, common.scale = TRUE) Prior dev  : 1.4371
> >> >>>>>>
> >> >>>>>> Generalized linear mixed model fit by maximum likelihood (Laplace
> >> >>>>>> Approximation) ['bglmerMod'] Family: binomial  ( logit ) Formula:
> >> >>>>>> Correct ~ Syntax.Semantics + (1 | Part.name) Data: trialglm
> >> >>>>>>
> >> >>>>>> AIC      BIC   logLik deviance df.resid 269.9    305.5   -126.0
> >> >>>>>> 251.9      376
> >> >>>>>>
> >> >>>>>> Scaled residuals: Min      1Q  Median      3Q     Max -0.9828
> >> >>>>>> -0.4281 -0.2445 -0.0002  5.7872
> >> >>>>>>
> >> >>>>>> Random effects: Groups    Name        Variance Std.Dev. Part.name
> >> >>>>>> (Intercept) 0.3836   0.6194 Number of obs: 385, groups:
> Part.name,
> >> >>>>>> 16
> >> >>>>>>
> >> >>>>>> Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept)
> >> >>>>>> -1.8671     0.4538  -4.114 3.89e-05 *** Syntax.Semantics A
> >> >>>>>> 0.8121     0.5397   1.505   0.1324 Syntax.Semantics B  -16.4391
> >> >>>>>> 1195.5031  -0.014   0.9890 Syntax.Semantics C   -1.1323
>  0.7462
> >> >>>>>> -1.517   0.1292 Syntax.Semantics D    0.1789     0.5853   0.306
> >> >>>>>> 0.7598 Syntax.Semantics E   -0.8071     0.7500  -1.076   0.2819
> >> >>>>>> Syntax.Semantics F   -1.5051     0.8575  -1.755   0.0792 .
> >> >>>>>> Syntax.Semantics G    0.4395     0.5417   0.811   0.4171 ---
> >> >>>>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >> >>>>>>
> >> >>>>>> Unfortunately the separation problem is still there. Should I be
> >> >>>>>> constraining the parameter somehow? How would I do that? The data
> >> >>>>>> is below.
> >> >
> >> >    Did you read the section in the URL I suggested?  Just using bglmer
> >> > isn't enough; you also have to set a prior on the fixed effects.
> >> >
> >> >   Your data don't seem to be attached (note that the mailing list
> >> > strips most non-ASCII file types).
> >> >
> >> >>>>>>
> >> >>>>>> In passing I also tried brglm which solves the separation problem
> >> >>>>>> but tells me comparison is significant which I don't believe one
> >> >>>>>> bit (see the data below). I am pretty sure about this because
> when
> >> >>>>>> I reveled and look at the comparisons I was able to compute using
> >> >>>>>> glmer, these turn out to be non-significant, when glmer told me
> >> >>>>>> they were:
> >> >>>>>>
> >> >>>>>>> trial<-brglm(Correct ~ Syntax.Semantics, data = trialglm, family
> >> >>>>>>> =
> >> >>>>>> binomial) Warning messages: 1: package ‘elrm’ was built under R
> >> >>>>>> version 3.1.2 2: package ‘coda’ was built under R version 3.1.3
> >> >>>>>>> summary(trial)
> >> >>>>>>
> >> >>>>>> Call: brglm(formula = Correct ~ Syntax.Semantics, family =
> >> >>>>>> binomial, data = trialglm)
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)
> >> >>>>>> -1.6358     0.4035  -4.053 5.05e-05 *** Syntax.Semantics A
>  0.6689
> >> >>>>>> 0.5169   1.294   0.1957 Syntax.Semantics B  -3.0182     1.4902
> >> >>>>>> -2.025   0.0428 * Syntax.Semantics C  -1.0135     0.6889  -1.471
> >> >>>>>> 0.1413 Syntax.Semantics D   0.1515     0.5571   0.272   0.7857
> >> >>>>>> Syntax.Semantics E  -0.7878     0.6937  -1.136   0.2561
> >> >>>>>> Syntax.Semantics F  -1.2874     0.7702  -1.672   0.0946 .
> >> >>>>>> Syntax.Semantics G   0.4358     0.5186   0.840   0.4007 ---
> Signif.
> >> >>>>>> codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >> >>>>>>
> >> >>>>>> (Dispersion parameter for binomial family taken to be 1)
> >> >>>>>>
> >> >>>>>> Null deviance: 262.51  on 384  degrees of freedom Residual
> >> >>>>>> deviance: 256.22  on 377  degrees of freedom Penalized deviance:
> >> >>>>>> 245.5554 AIC:  272.22
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> MCMCglmm is too complex for me.
> >> >>>>>>
> >> >>>>>> Wolfgang, I tried the penalized likelihood method (logistf
> >> >>>>>> function) but output is hard to read:
> >> >>>>>>
> >> >>>>>>> trial<-logistf(Correct ~ Syntax.Semantics, data = trialglm,
> >> >>>>>>> family =
> >> >>>>>> binomial) Warning messages: 1: package ‘logistf’ was built under
> R
> >> >>>>>> version 3.1.2 2: package ‘mice’ was built under R version 3.1.2
> >> >>>>>>> summary(trial)
> >> >>>>>> logistf(formula = Correct ~ Syntax.Semantics, data = trialglm,
> >> >>>>>> family = binomial)
> >> >>>>>>
> >> >>>>>> Model fitted by Penalized ML Confidence intervals and p-values by
> >> >>>>>> Profile Likelihood Profile Likelihood Profile Likelihood Profile
> >> >>>>>> Likelihood Profile Likelihood Profile Likelihood Profile
> Likelihood
> >> >>>>>> Profile Likelihood
> >> >>>>>>
> >> >>>>>> coef  se(coef) lower 0.95 upper 0.95 Chisq            p
> (Intercept)
> >> >>>>>> 3.2094017 0.7724482  2.9648747  3.5127830 0.000000 1.000000e+00
> >> >>>>>> Syntax.Semantics A  4.1767737 6.3254344  0.4224696 12.0673987
> >> >>>>>> 64.224452 1.110223e-15 Syntax.Semantics B -1.0583602 0.8959376
> >> >>>>>> -1.3963977 -0.7625216  0.000000 1.000000e+00 Syntax.Semantics C
> >> >>>>>> -0.7299070 0.9308193 -1.0765598 -0.4180076  0.000000 1.000000e+00
> >> >>>>>> Syntax.Semantics D  0.2314740 1.1563731 -0.1704535  0.6479908
> >> >>>>>> 1.156512 2.821901e-01 Syntax.Semantics E -0.6476907 0.9771824
> >> >>>>>> -1.0076740 -0.3164066  0.000000 1.000000e+00 Syntax.Semantics F
> >> >>>>>> -0.8271499 0.9305931 -1.1743834 -0.5160799  0.000000 1.000000e+00
> >> >>>>>> Syntax.Semantics G  0.9909046 1.3787175  0.5457741  1.5353981
> >> >>>>>> 0.000000 1.000000e+00
> >> >>>>>>
> >> >>>>>> Likelihood ratio test=121.9841 on 7 df, p=0, n=385 Wald test =
> >> >>>>>> 5.334321 on 7 df, p = 0.6192356
> >> >>>>>>
> >> >>>>>> In particular, what is this model telling me? That Z (my ref
> level)
> >> >>>>>> and B are significantly different?
> >> >>>>>>
> >> >>>>>> I'm happy to try the elrm function with exact logistic regression
> >> >>>>>> but I am not capable of programming it. Besides, would it give me
> >> >>>>>> valid estimates for the comparison between the Z and B levels?
> The
> >> >>>>>> data frame should look like this:
> >> >>>>>>
> >> >>>>>> Outcome variable (Correct, incorrect) Predictor variable (A, B,
> C,
> >> >>>>>> D, E, F, G, Z) Counts (E: 38,3; B: 51,0; Z: 37,7; G: 40,12; D:
> >> >>>>>> 36,8; C:45,3; A: 34,13; F:65,22).
> >> >>>>>>
> >> >>>>>> Thank you! F.
> >> >>>>>>
> >> >>>>>> On Thu, May 28, 2015 at 2:28 AM, Ben Bolker <bbolker at gmail.com>
> >> >>>>>> wrote:
> >> >>>>>>
> >> >>>>>>> And for what it's worth, you can do this in conjunction with
> lme4
> >> >>>>>>> by using the blme package instead (a thin Bayesian wrapper
> around
> >> >>>>>>> lme4), or via the MCMCglmm package; see
> >> >>>>>>>
> http://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html
> >> >>>>>>> for an example (search for "complete separation").
> >> >>>>>>>
> >> >>>>>>> On Wed, May 27, 2015 at 5:21 PM, Viechtbauer Wolfgang (STAT)
> >> >>>>>>> <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
> >> >>>>>>>> You may need to consider using an 'exact', Bayesian, or
> >> >>>>>>>> penalized
> >> >>>>>>> likelihood approach (along the lines proposed by Firth).
> >> >>>>>>>>
> >> >>>>>>>> Maybe a place to start:
> >> >>>>>>>
> >> >>>>>>>
> >> >>>
> >> >>>
> http://sas-and-r.blogspot.nl/2010/11/example-815-firth-logistic-regression.html
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> > Best,
> >> >>>>>>>> Wolfgang
> >> >>>>>>>>
> >> >>>>>>>>> -----Original Message----- From: R-sig-mixed-models
> >> >>>>>>>>> [mailto:r-sig-mixed-models-bounces at r- project.org] On Behalf
> >> >>>>>>>>> Of Francesco Romano Sent: Wednesday, May 27, 2015 23:00 To:
> >> >>>>>>>>> r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Zero
> >> >>>>>>>>> cells in contrast matrix problem
> >> >>>>>>>>>
> >> >>>>>>>>> After giving up on a glmer for my data, I remembered a post
> >> >>>>>>>>> by Roger
> >> >>>>>>> Levy
> >> >>>>>>>>> suggesting to try the use non mixed effects glm when one of
> >> >>>>>>>>> the cells in a matrix is zero.
> >> >>>>>>>>>
> >> >>>>>>>>> To put this into perspective:
> >> >>>>>>>>>
> >> >>>>>>>>>> trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name),
> >> >>>>>>>>>> data =
> >> >>>>>>>>> trialglm, family = binomial)
> >> >>>>>>>>>
> >> >>>>>>>>> Warning messages: 1: In checkConv(attr(opt, "derivs"),
> >> >>>>>>>>> opt$par, ctrl = control$checkConv, : Model failed to converge
> >> >>>>>>>>> with max|grad| = 0.053657 (tol = 0.001, component 4) 2: In
> >> >>>>>>>>> checkConv(attr(opt, "derivs"), opt$par, ctrl =
> >> >>>>>>>>> control$checkConv, : Model is nearly unidentifiable: large
> >> >>>>>>>>> eigenvalue ratio - Rescale variables?
> >> >>>>>>>>>
> >> >>>>>>>>> My data has a binary outcome, correct or incorrect, a fixed
> >> >>>>>>>>> effect predictor factor with 8 levels, and a random effect
> >> >>>>>>>>> for participants. I believe the problem R is encountering is
> >> >>>>>>>>> with one level of the factor (let us call it level B) which
> >> >>>>>>>>> has no counts (no I won' t try to post the table from the
> >> >>>>>>>>> paper with the counts because I know it will get garbled
> >> >>>>>>>>> up!).
> >> >>>>>>>>>
> >> >>>>>>>>> I attempt a glm with the same data:
> >> >>>>>>>>>
> >> >>>>>>>>>> trial<-glm(Correct ~ Syntax.Semantics, data = trialglm,
> >> >>>>>>>>>> family =
> >> >>>>>>>>> binomial)
> >> >>>>>>>>>> anova(trial)
> >> >>>>>>>>> Analysis of Deviance Table
> >> >>>>>>>>>
> >> >>>>>>>>> Model: binomial, link: logit
> >> >>>>>>>>>
> >> >>>>>>>>> Response: Correct
> >> >>>>>>>>>
> >> >>>>>>>>> Terms added sequentially (first to last)
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> Df Deviance Resid. Df Resid. Dev NULL
> >> >>>>>>>>> 384     289.63 Syntax.Semantics  7   34.651       377
> >> >>>>>>>>> 254.97
> >> >>>>>>>>>> summary(trial)
> >> >>>>>>>>>
> >> >>>>>>>>> Call: glm(formula = Correct ~ Syntax.Semantics, family =
> >> >>>>>>>>> binomial, data = trialglm)
> >> >>>>>>>>>
> >> >>>>>>>>> Deviance Residuals: Min        1Q    Median        3Q
> >> >>>>>>>>> Max -0.79480  -0.62569  -0.34474  -0.00013   2.52113
> >> >>>>>>>>>
> >> >>>>>>>>> Coefficients: Estimate Std. Error z value Pr(>|z|)
> >> >>>>>>>>> (Intercept)                 -1.6917     0.4113  -4.113
> >> >>>>>>>>> 3.91e-05 *** Syntax.Semantics A   0.7013     0.5241   1.338
> >> >>>>>>>>> 0.1809 Syntax.Semantics B -16.8744   904.5273  -0.019
> >> >>>>>>>>> 0.9851 Syntax.Semantics C  -1.1015     0.7231  -1.523
> >> >>>>>>>>> 0.1277 Syntax.Semantics D   0.1602     0.5667   0.283
> >> >>>>>>>>> 0.7774 Syntax.Semantics E  -0.8733     0.7267  -1.202
> >> >>>>>>>>> 0.2295 Syntax.Semantics F  -1.4438     0.8312  -1.737
> >> >>>>>>>>> 0.0824 . Syntax.Semantics G   0.4630     0.5262   0.880
> >> >>>>>>>>> 0.3789 --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
> >> >>>>>>>>> ‘.’ 0.1 ‘ ’ 1
> >> >>>>>>>>>
> >> >>>>>>>>> (Dispersion parameter for binomial family taken to be 1)
> >> >>>>>>>>>
> >> >>>>>>>>> Null deviance: 289.63  on 384  degrees of freedom Residual
> >> >>>>>>>>> deviance: 254.98  on 377  degrees of freedom AIC: 270.98
> >> >>>>>>>>>
> >> >>>>>>>>> Number of Fisher Scoring iterations: 17
> >> >>>>>>>>>
> >> >>>>>>>>> The comparison I'm interested in is between level B and the
> >> >>>>>>>>> reference level but it cannot be estimated as shown by the
> >> >>>>>>>>> ridiculously high estimate and SE value.
> >> >>>>>>>>>
> >> >>>>>>>>> Any suggestions on how to get a decent beta, SE, z, and p?
> >> >>>>>>>>> It's the only comparison missing in the table for the levels
> >> >>>>>>>>> I need so I think it
> >> >>>>>>> would
> >> >>>>>>>>> be a bit unacademic of me to close this deal saying 'the
> >> >>>>>>>>> difference
> >> >>>>>>> could
> >> >>>>>>>>> not be estimated due to zero count'.
> >> >>>>>>>>>
> >> >>>>>>>>> And by the way I have seen this comparison being generated
> >> >>>>>>>>> using other stats.
> >> >>>>>>>>>
> >> >>>>>>>>> Thanks in advance,
> >> >>>>>>>>>
> >> >>>>>>>>> Frank
> >> >>>>>>>>>
> >> >>>>>>>>> [[alternative HTML version deleted]]
> >> >>>>>>>>>
> >> >>>>>>>>> _______________________________________________
> >> >>>>>>>>> R-sig-mixed-models at r-project.org mailing list
> >> >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >> >>>>>>>> _______________________________________________
> >> >>>>>>>> R-sig-mixed-models at r-project.org mailing list
> >> >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >> >>>>>>>
> >> >>>>>>
> >> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Sent from Gmail for IPhone
> >> >>
> >> >
> >> >
> >> >
> >>
> >
> >
> >
> > --
> > Frank Romano Ph.D.
> >
> > Tel. +39 3911639149
> >
> > LinkedIn
> > https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162
> >
> > Academia.edu
> > https://sheffield.academia.edu/FrancescoRomano
>



-- 
Frank Romano Ph.D.

Tel. +39 3911639149

*LinkedIn*
https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162

*Academia.edu*
https://sheffield.academia.edu/FrancescoRomano

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list