[R-sig-ME] Zero cells in contrast matrix problem

Mon Oct 26 12:10:27 CET 2015

Many thanks Ben,

but I tried that already:

> revanaA<-
bglmer(Correct~Syntax*Animacy*Prof.group.2+(1|Part.name)+(1|Item), data =
revana, family = binomial, fixef.prior = normal(cov = diag(9,16)))
fixed-effect model matrix is rank deficient so dropping 2 columns /
coefficients
Error in normal(cov = cov, common.scale = FALSE) :
  normal prior covariance of improper length

On Mon, Oct 26, 2015 at 12:06 PM, Ben Bolker <bbolker at gmail.com> wrote:

> On 15-10-26 06:56 AM, Francesco Romano wrote:
> > I wonder if anyone can help with the separation problem originally solved
> > by Ben Bolker (see thread).
> > The model and fitting I used previously was
> >
> > trial<-bglmer(Correct ~ Syntax.Semantics, data = trialglm, family =
> > binomial, fixef.prior = normal(cov = diag(9,4))
> >
> > which now has to change because the Syntax.Semantcs factor needs to be
> > split into separate
> > within-subjects variables, Syntax, a factor with two levels, and
> Animacy, a
> > factor with four levels.
> > In addition a new between-subjects factor called Group with two levels
> > (native vs non-native speaker)
> > has to be added which determines the following model, fit by bglmer:
> >
> > trial<-bglmer(Correct ~ Syntax*Animacy*Group+ (1|Part.name)+(1|Item),
> data
> > = trialglm, family = binomial,
> > fixef.prior = normal(cov = diag???)
> >
> > What values should I use for the cov=diag portion in order to continue
> > attempting convergence of a model
> > that includes the random effects?
>
>     In general a reasonable form is normal(cov = diag(v,np)) where v is
> the prior variance (generally something reasonably
> large/non-informative; 9 (=std dev of 3) is probably an OK default) and
> np is the number of fixed-effect parameters.  You can figure this out via
>
> ncol(model.matrix(~Syntax*Animacy*Group,data=trialglm)
>
> or multiply 2*4*2 to get 16 ...
>
> >
> > R returns the following error because I don't know how to establish the
> > parameters when more than one
> > fixed effect is involved:
> >
> > Error in normal(cov = cov, common.scale = FALSE) :
> >   normal prior covariance of improper length
> >
> > Many thanks in advance for any help!
> >
> >
> >
> >
> >
> > On Thu, May 28, 2015 at 10:46 PM, Ben Bolker <bbolker at gmail.com> wrote:
> >
> >>   I don't see your data -- I see a little tiny subset, but that's not
> >> really enough for a reproducible example.
> >>
> >> This is the example given in the URL I sent:
> >>
> >> cmod_blme_L2 <- bglmer(predation~ttt+(1|block),data=newdat,
> >>                        family=binomial,
> >>                        fixef.prior = normal(cov = diag(9,4)))
> >>
> >> trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name),
> >>           data =trialglm,
> >>          family = binomial,
> >>         fixef.prior = normal(cov=diag(9,8)))
> >>
> >> The last line specifies an 8x8 matrix (because you have 8 fixed effect
> >> parameters) with a value of 9 on the diagonal, meaning the priors for
> >> the fixed effects are independent and each is Normal with a sd of
> >> sqrt(9)=3.
> >>
> >>
> >> On Thu, May 28, 2015 at 3:25 PM, Francesco Romano
> >> <francescobryanromano at gmail.com> wrote:
> >>> Yes but this seems a bit above my head without your help. The data are
> in
> >>> the three variables at the bottom of my email but I forgot to mention
> the
> >>> random participant effect (n = 17). Thanks!
> >>>
> >>>
> >>> Il giovedì 28 maggio 2015, Ben Bolker <bbolker at gmail.com> ha scritto:
> >>>>
> > On 15-05-28 06:55 AM, Francesco Romano wrote:
> >>>>>> Many thanks to both.
> >>>>>>
> >>>>>> The approaches you suggest (and others online) help one deal with
> >>>>>> the separation problem but don't offer any specific advice as to
> >>>>>> how getting a valid p coefficient when comparing two levels of the
> >>>>>> model vexed by separation.
> >>>>>>
> >>>>>> Ben, here's the output of the bglmer which by the way would be
> >>>>>> ideal since it allows me to retain the random effect so that all my
> >>>>>> pairwise comparisons are conducted using mixed effects.
> >>>>>>
> >>>>>>> trial<-bglmer(Correct ~ Syntax.Semantics+(1|Part.name), data =
> >>>>>>> trialglm,
> >>>>>> family = binomial) Warning message: package ‘blme’ was built under
> >>>>>> R version 3.1.2
> >>>>>>> summary(trial)
> >>>>>> Cov prior  : Part.name ~ wishart(df = 3.5, scale = Inf,
> >>>>>> posterior.scale = cov, common.scale = TRUE) Prior dev  : 1.4371
> >>>>>>
> >>>>>> Generalized linear mixed model fit by maximum likelihood (Laplace
> >>>>>> Approximation) ['bglmerMod'] Family: binomial  ( logit ) Formula:
> >>>>>> Correct ~ Syntax.Semantics + (1 | Part.name) Data: trialglm
> >>>>>>
> >>>>>> AIC      BIC   logLik deviance df.resid 269.9    305.5   -126.0
> >>>>>> 251.9      376
> >>>>>>
> >>>>>> Scaled residuals: Min      1Q  Median      3Q     Max -0.9828
> >>>>>> -0.4281 -0.2445 -0.0002  5.7872
> >>>>>>
> >>>>>> Random effects: Groups    Name        Variance Std.Dev. Part.name
> >>>>>> (Intercept) 0.3836   0.6194 Number of obs: 385, groups:  Part.name,
> >>>>>> 16
> >>>>>>
> >>>>>> Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept)
> >>>>>> -1.8671     0.4538  -4.114 3.89e-05 *** Syntax.Semantics A
> >>>>>> 0.8121     0.5397   1.505   0.1324 Syntax.Semantics B  -16.4391
> >>>>>> 1195.5031  -0.014   0.9890 Syntax.Semantics C   -1.1323     0.7462
> >>>>>> -1.517   0.1292 Syntax.Semantics D    0.1789     0.5853   0.306
> >>>>>> 0.7598 Syntax.Semantics E   -0.8071     0.7500  -1.076   0.2819
> >>>>>> Syntax.Semantics F   -1.5051     0.8575  -1.755   0.0792 .
> >>>>>> Syntax.Semantics G    0.4395     0.5417   0.811   0.4171 ---
> >>>>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>>>>>
> >>>>>> Unfortunately the separation problem is still there. Should I be
> >>>>>> constraining the parameter somehow? How would I do that? The data
> >>>>>> is below.
> >
> >    Did you read the section in the URL I suggested?  Just using bglmer
> > isn't enough; you also have to set a prior on the fixed effects.
> >
> >   Your data don't seem to be attached (note that the mailing list
> > strips most non-ASCII file types).
> >
> >>>>>>
> >>>>>> In passing I also tried brglm which solves the separation problem
> >>>>>> but tells me comparison is significant which I don't believe one
> >>>>>> bit (see the data below). I am pretty sure about this because when
> >>>>>> I reveled and look at the comparisons I was able to compute using
> >>>>>> glmer, these turn out to be non-significant, when glmer told me
> >>>>>> they were:
> >>>>>>
> >>>>>>> trial<-brglm(Correct ~ Syntax.Semantics, data = trialglm, family
> >>>>>>> =
> >>>>>> binomial) Warning messages: 1: package ‘elrm’ was built under R
> >>>>>> version 3.1.2 2: package ‘coda’ was built under R version 3.1.3
> >>>>>>> summary(trial)
> >>>>>>
> >>>>>> Call: brglm(formula = Correct ~ Syntax.Semantics, family =
> >>>>>> binomial, data = trialglm)
> >>>>>>
> >>>>>>
> >>>>>> Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)
> >>>>>> -1.6358     0.4035  -4.053 5.05e-05 *** Syntax.Semantics A   0.6689
> >>>>>> 0.5169   1.294   0.1957 Syntax.Semantics B  -3.0182     1.4902
> >>>>>> -2.025   0.0428 * Syntax.Semantics C  -1.0135     0.6889  -1.471
> >>>>>> 0.1413 Syntax.Semantics D   0.1515     0.5571   0.272   0.7857
> >>>>>> Syntax.Semantics E  -0.7878     0.6937  -1.136   0.2561
> >>>>>> Syntax.Semantics F  -1.2874     0.7702  -1.672   0.0946 .
> >>>>>> Syntax.Semantics G   0.4358     0.5186   0.840   0.4007 --- Signif.
> >>>>>> codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>>>>>
> >>>>>> (Dispersion parameter for binomial family taken to be 1)
> >>>>>>
> >>>>>> Null deviance: 262.51  on 384  degrees of freedom Residual
> >>>>>> deviance: 256.22  on 377  degrees of freedom Penalized deviance:
> >>>>>> 245.5554 AIC:  272.22
> >>>>>>
> >>>>>>
> >>>>>> MCMCglmm is too complex for me.
> >>>>>>
> >>>>>> Wolfgang, I tried the penalized likelihood method (logistf
> >>>>>> function) but output is hard to read:
> >>>>>>
> >>>>>>> trial<-logistf(Correct ~ Syntax.Semantics, data = trialglm,
> >>>>>>> family =
> >>>>>> binomial) Warning messages: 1: package ‘logistf’ was built under R
> >>>>>> version 3.1.2 2: package ‘mice’ was built under R version 3.1.2
> >>>>>>> summary(trial)
> >>>>>> logistf(formula = Correct ~ Syntax.Semantics, data = trialglm,
> >>>>>> family = binomial)
> >>>>>>
> >>>>>> Model fitted by Penalized ML Confidence intervals and p-values by
> >>>>>> Profile Likelihood Profile Likelihood Profile Likelihood Profile
> >>>>>> Likelihood Profile Likelihood Profile Likelihood Profile Likelihood
> >>>>>> Profile Likelihood
> >>>>>>
> >>>>>> coef  se(coef) lower 0.95 upper 0.95 Chisq            p (Intercept)
> >>>>>> 3.2094017 0.7724482  2.9648747  3.5127830 0.000000 1.000000e+00
> >>>>>> Syntax.Semantics A  4.1767737 6.3254344  0.4224696 12.0673987
> >>>>>> 64.224452 1.110223e-15 Syntax.Semantics B -1.0583602 0.8959376
> >>>>>> -1.3963977 -0.7625216  0.000000 1.000000e+00 Syntax.Semantics C
> >>>>>> -0.7299070 0.9308193 -1.0765598 -0.4180076  0.000000 1.000000e+00
> >>>>>> Syntax.Semantics D  0.2314740 1.1563731 -0.1704535  0.6479908
> >>>>>> 1.156512 2.821901e-01 Syntax.Semantics E -0.6476907 0.9771824
> >>>>>> -1.0076740 -0.3164066  0.000000 1.000000e+00 Syntax.Semantics F
> >>>>>> -0.8271499 0.9305931 -1.1743834 -0.5160799  0.000000 1.000000e+00
> >>>>>> Syntax.Semantics G  0.9909046 1.3787175  0.5457741  1.5353981
> >>>>>> 0.000000 1.000000e+00
> >>>>>>
> >>>>>> Likelihood ratio test=121.9841 on 7 df, p=0, n=385 Wald test =
> >>>>>> 5.334321 on 7 df, p = 0.6192356
> >>>>>>
> >>>>>> In particular, what is this model telling me? That Z (my ref level)
> >>>>>> and B are significantly different?
> >>>>>>
> >>>>>> I'm happy to try the elrm function with exact logistic regression
> >>>>>> but I am not capable of programming it. Besides, would it give me
> >>>>>> valid estimates for the comparison between the Z and B levels? The
> >>>>>> data frame should look like this:
> >>>>>>
> >>>>>> Outcome variable (Correct, incorrect) Predictor variable (A, B, C,
> >>>>>> D, E, F, G, Z) Counts (E: 38,3; B: 51,0; Z: 37,7; G: 40,12; D:
> >>>>>> 36,8; C:45,3; A: 34,13; F:65,22).
> >>>>>>
> >>>>>> Thank you! F.
> >>>>>>
> >>>>>> On Thu, May 28, 2015 at 2:28 AM, Ben Bolker <bbolker at gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> And for what it's worth, you can do this in conjunction with lme4
> >>>>>>> by using the blme package instead (a thin Bayesian wrapper around
> >>>>>>> lme4), or via the MCMCglmm package; see
> >>>>>>> http://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html
> >>>>>>> for an example (search for "complete separation").
> >>>>>>>
> >>>>>>> On Wed, May 27, 2015 at 5:21 PM, Viechtbauer Wolfgang (STAT)
> >>>>>>> <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
> >>>>>>>> You may need to consider using an 'exact', Bayesian, or
> >>>>>>>> penalized
> >>>>>>> likelihood approach (along the lines proposed by Firth).
> >>>>>>>>
> >>>>>>>> Maybe a place to start:
> >>>>>>>
> >>>>>>>
> >>>
> http://sas-and-r.blogspot.nl/2010/11/example-815-firth-logistic-regression.html
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> > Best,
> >>>>>>>> Wolfgang
> >>>>>>>>
> >>>>>>>>> -----Original Message----- From: R-sig-mixed-models
> >>>>>>>>> [mailto:r-sig-mixed-models-bounces at r- project.org] On Behalf
> >>>>>>>>> Of Francesco Romano Sent: Wednesday, May 27, 2015 23:00 To:
> >>>>>>>>> r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Zero
> >>>>>>>>> cells in contrast matrix problem
> >>>>>>>>>
> >>>>>>>>> After giving up on a glmer for my data, I remembered a post
> >>>>>>>>> by Roger
> >>>>>>> Levy
> >>>>>>>>> suggesting to try the use non mixed effects glm when one of
> >>>>>>>>> the cells in a matrix is zero.
> >>>>>>>>>
> >>>>>>>>> To put this into perspective:
> >>>>>>>>>
> >>>>>>>>>> trial<-glmer(Correct ~ Syntax.Semantics + (1 | Part.name),
> >>>>>>>>>> data =
> >>>>>>>>> trialglm, family = binomial)
> >>>>>>>>>
> >>>>>>>>> Warning messages: 1: In checkConv(attr(opt, "derivs"),
> >>>>>>>>> opt$par, ctrl = control$checkConv, : Model failed to converge
> >>>>>>>>> with max|grad| = 0.053657 (tol = 0.001, component 4) 2: In
> >>>>>>>>> checkConv(attr(opt, "derivs"), opt$par, ctrl =
> >>>>>>>>> control$checkConv, : Model is nearly unidentifiable: large
> >>>>>>>>> eigenvalue ratio - Rescale variables?
> >>>>>>>>>
> >>>>>>>>> My data has a binary outcome, correct or incorrect, a fixed
> >>>>>>>>> effect predictor factor with 8 levels, and a random effect
> >>>>>>>>> for participants. I believe the problem R is encountering is
> >>>>>>>>> with one level of the factor (let us call it level B) which
> >>>>>>>>> has no counts (no I won' t try to post the table from the
> >>>>>>>>> paper with the counts because I know it will get garbled
> >>>>>>>>> up!).
> >>>>>>>>>
> >>>>>>>>> I attempt a glm with the same data:
> >>>>>>>>>
> >>>>>>>>>> trial<-glm(Correct ~ Syntax.Semantics, data = trialglm,
> >>>>>>>>>> family =
> >>>>>>>>> binomial)
> >>>>>>>>>> anova(trial)
> >>>>>>>>> Analysis of Deviance Table
> >>>>>>>>>
> >>>>>>>>> Model: binomial, link: logit
> >>>>>>>>>
> >>>>>>>>> Response: Correct
> >>>>>>>>>
> >>>>>>>>> Terms added sequentially (first to last)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Df Deviance Resid. Df Resid. Dev NULL
> >>>>>>>>> 384     289.63 Syntax.Semantics  7   34.651       377
> >>>>>>>>> 254.97
> >>>>>>>>>> summary(trial)
> >>>>>>>>>
> >>>>>>>>> Call: glm(formula = Correct ~ Syntax.Semantics, family =
> >>>>>>>>> binomial, data = trialglm)
> >>>>>>>>>
> >>>>>>>>> Deviance Residuals: Min        1Q    Median        3Q
> >>>>>>>>> Max -0.79480  -0.62569  -0.34474  -0.00013   2.52113
> >>>>>>>>>
> >>>>>>>>> Coefficients: Estimate Std. Error z value Pr(>|z|)
> >>>>>>>>> (Intercept)                 -1.6917     0.4113  -4.113
> >>>>>>>>> 3.91e-05 *** Syntax.Semantics A   0.7013     0.5241   1.338
> >>>>>>>>> 0.1809 Syntax.Semantics B -16.8744   904.5273  -0.019
> >>>>>>>>> 0.9851 Syntax.Semantics C  -1.1015     0.7231  -1.523
> >>>>>>>>> 0.1277 Syntax.Semantics D   0.1602     0.5667   0.283
> >>>>>>>>> 0.7774 Syntax.Semantics E  -0.8733     0.7267  -1.202
> >>>>>>>>> 0.2295 Syntax.Semantics F  -1.4438     0.8312  -1.737
> >>>>>>>>> 0.0824 . Syntax.Semantics G   0.4630     0.5262   0.880
> >>>>>>>>> 0.3789 --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
> >>>>>>>>> ‘.’ 0.1 ‘ ’ 1
> >>>>>>>>>
> >>>>>>>>> (Dispersion parameter for binomial family taken to be 1)
> >>>>>>>>>
> >>>>>>>>> Null deviance: 289.63  on 384  degrees of freedom Residual
> >>>>>>>>> deviance: 254.98  on 377  degrees of freedom AIC: 270.98
> >>>>>>>>>
> >>>>>>>>> Number of Fisher Scoring iterations: 17
> >>>>>>>>>
> >>>>>>>>> The comparison I'm interested in is between level B and the
> >>>>>>>>> reference level but it cannot be estimated as shown by the
> >>>>>>>>> ridiculously high estimate and SE value.
> >>>>>>>>>
> >>>>>>>>> Any suggestions on how to get a decent beta, SE, z, and p?
> >>>>>>>>> It's the only comparison missing in the table for the levels
> >>>>>>>>> I need so I think it
> >>>>>>> would
> >>>>>>>>> be a bit unacademic of me to close this deal saying 'the
> >>>>>>>>> difference
> >>>>>>> could
> >>>>>>>>> not be estimated due to zero count'.
> >>>>>>>>>
> >>>>>>>>> And by the way I have seen this comparison being generated
> >>>>>>>>> using other stats.
> >>>>>>>>>
> >>>>>>>>> Thanks in advance,
> >>>>>>>>>
> >>>>>>>>> Frank
> >>>>>>>>>
> >>>>>>>>> [[alternative HTML version deleted]]
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> R-sig-mixed-models at r-project.org mailing list
> >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>>>>> _______________________________________________
> >>>>>>>> R-sig-mixed-models at r-project.org mailing list
> >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>>>>
> >>>>>>
> >
> >>>
> >>>
> >>>
> >>> --
> >>> Sent from Gmail for IPhone
> >>
> >
> >
> >
>
>

-- 
Frank Romano Ph.D.

Tel. +39 3911639149

*LinkedIn*
https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162

*Academia.edu*
https://sheffield.academia.edu/FrancescoRomano

	[[alternative HTML version deleted]]