[R-sig-ME] Problem specifying uncorrelated random intercepts and slopes for a multi-df covariate
Juho Kristian Ruohonen
juho@kr|@t|@n@ruohonen @end|ng |rom gm@||@com
Fri Aug 5 16:16:54 CEST 2022
Many thanks, Ben. While I find your example of singularity a bit hard to
follow, I hope to have correctly identified a gist of "not enough data to
observe/estimate between-cluster variation in the slopes of x2B and x2C."
Even if that explains the zero estimates though, the fact that those
estimates become non-zero after adding correlation parameters remains
completely mystifying to me.
I've now switched to a different file hosting service: hopefully my dataset
<https://gofile.io/d/pS7O1Q> doesn't get deleted this time.
Best,
Juho
pe 5. elok. 2022 klo 1.12 Ben Bolker (bbolker using gmail.com) kirjoitti:
> I will take a look if I get a chance.
>
> "Singular" isn't quite the same as "inestimable", I think (although
> to be honest I'm not sure what your definition of "inestimable" is). It
> just means that the best estimate is on the boundary of the feasible
> space. There are various more and less mathy ways of
> restating/explaining that, one simple example is if you have a single
> grouping variable; if true between-group variance is v_b and true
> within-group variance is v_w, and there are N samples per group, the
> expected *observed* among-group variation+ is v_b + v_w/n. If the sample
> variance within groups is v'_w and the sample variance among groups is
> LESS THAN v'_w/n, then your best conclusion is that the between-group
> variance is negative! (or zero, if you're not allowing such
> impossibilities).
>
> Singular fits are most common when the model is overfitted (too much
> complexity/too much noise/not enough signal/not enough groups), but can
> happen in many different circumstances.
>
> When I try to retrieve your data file the web page says it has been
> deleted.
>
>
>
> On 2022-08-04 11:07 a.m., Juho Kristian Ruohonen wrote:
> > Many thanks, Ben and João. I did as advised, converting x2 into two
> dummies
> > and specifying the random effects as *(x2B+x2C||id)*. This yields the
> > correct number of estimated parameters (1 random intercepts, 2 random
> > slopes). However, there's something I don't understand about the results.
> >
> > Firstly, there's a warning about a singular fit, which I take to mean
> that
> > some parameters are inestimable. Judging from the following output, I
> > gather that it must be the 2 random slopes, which are estimated at
> > essentially zero:
> >
> > *> summary(slopes.nocorr)*
> > *...*
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *Random effects: Groups Name Variance Std.Dev. id
> (Intercept)
> > 5.539e-01 7.443e-01 id.1 x2B 1.546e-14 1.243e-07 id.2 x2C
> > 0.000e+00 0.000e+00Number of obs: 1405, groups: id, 292Fixed effects:
> > Estimate Std. Error z value Pr(>|z|) (Intercept) -0.93057
> > 0.27450 -3.390 0.000699 ***x1 0.51158 0.26550 1.927
> > 0.053997 . x2B 2.54505 0.20936 12.156 < 2e-16 ***x2C
> > 2.30179 0.30480 7.552 4.29e-14 ***x3 -0.77494 0.11660
> > -6.646 3.01e-11 ***x4 0.24489 0.04957 4.940 7.80e-07
> ***x5
> > 0.28619 0.13810 2.072 0.038235 * x6 -1.07816
> > 0.90224 -1.195 0.232091 x7 -0.67521 0.32810 -2.058
> > 0.039595 * *
> > *x8 -0.76275 0.28824 -2.646 0.008138 ** *
> >
> > It seems very strange that the random slopes should be inestimable: x2B
> and
> > x2C are not exceedingly scarce conditions: there are 277 observations of
> > the former, 91 of the latter. There are 52 IDs with observations of both
> xB
> > = 1 and xB = 0. And there are 33 IDs with observations of both xC = 1 and
> > xC = 0. So, I don't understand why the random slopes couldn't be
> estimated.
> >
> > Stranger still, if I fit an otherwise identical model with *correlated*
> random
> > effects, the random-slope estimates suddenly do differ from 0 (although
> > there's still a singularity warning). Like so:
> >
> > *> slopes.corr <- glmer(y ~ (x2B+x2C|id) + x1 + x2B + x2C + x3 + x4 +
> x5
> > + x6 + x7 + x8, family = binomial, data = mydata, control =
> > glmerControl(optimizer = "optimx", optCtrl = list(method = "nlm")), nAGQ
> =
> > 1)*
> > *...*
> >
> >
> >
> >
> >
> >
> > *Random effects: Groups Name Variance Std.Dev. Corr id
> > (Intercept) 0.5827 0.7633 x2B 0.0798
> > 0.2825 -0.22 x2C 0.2060 0.4539 -0.68
> 0.86Number
> > of obs: 1405, groups: id, 292*
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *Fixed effects: Estimate Std. Error z value Pr(>|z|)
> > (Intercept) -0.98131 0.30416 -3.226 0.00125 ** x1
> 0.56391
> > 0.29692 1.899 0.05754 . x2B 2.56529 0.28735 8.928 <
> > 2e-16 ***x2C 2.17394 0.41814 5.199 2.00e-07 ***x3
> > -0.77852 0.11699 -6.655 2.84e-11 ***x4 0.24384 0.04982
> > 4.895 9.84e-07 ***x5 0.28790 0.14005 2.056 0.03981 * x6
> > -1.08438 0.91036 -1.191 0.23359 x7 -0.66753
> > 0.32962 -2.025 0.04285 * x8 -0.75425 0.28913 -2.609
> > 0.00909 ** *
> >
> > It boggles my mind that the 2 random slopes should be inestimable in the
> > simpler model (with no correlation params) but somehow become estimable
> > when you introduce 3 more parameters by allowing random-effect
> > correlations. My brain has melted. Does anyone have a clue what's going
> on?
> > The anonymized datafile is available here <https://file.io/VKruszwJBwcK
> >.
> >
> > Best,
> >
> > Juho
> >
> >
> >
> > to 4. elok. 2022 klo 0.30 João Veríssimo (jl.verissimo using gmail.com)
> kirjoitti:
> >
> >> (1+x2 || id) is shorter notation for (1 | id) + (0 + x2 | id ).
> >> And because x2 is a factor, suppressing the intercept leads to the
> >> 'cell-mean coding' of x2: what is being estimated is the between-id
> >> variation around the means of each level, A, B, and C (and their
> >> correlation).
> >>
> >> In order to get what you want, turn x2 into two numeric variables
> >> according to its contrasts. For example:
> >> x2num1 <- ifelse(x2=="B", 1, 0)
> >> x2num2 <- ifelse(x2=="C", 1, 0)
> >>
> >> Then (1 + x2num1 + x2num2 || id) will give you the random intercept, two
> >> random slopes and no correlations.
> >>
> >> João
> >>
> >> On 03/08/2022 21:10, Juho Kristian Ruohonen wrote:
> >>> Dear List,
> >>>
> >>> This is a logistic GLMM with 1 grouping factor + 8 fixed-effect
> >> covariates.
> >>> One of the fixed effects, namely x2, has three unordered categories.
> This
> >>> is the covariate for whose 2 non-reference categories I want to
> estimate
> >>> random slopes, along with the random intercepts with which I don't
> expect
> >>> the slopes to be correlated. But I fail:
> >>>
> >>>
> >>>
> >>> *> VarCorr(glmer(y ~ (x2||id) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8,
> >>> family = binomial, data = mydata, control = glmerControl(optimizer =
> >>> "optimx", optCtrl = list(method = "nlm")), nAGQ = 1))*
> >>>
> >>>
> >>> *boundary (singular) fit: see help('isSingular')*
> >>>
> >>>
> >>>
> >>>
> >>> * Groups Name Std.Dev. Corr id (Intercept) 0.00000
> >>> id.1 x2A 0.76331 x2B
> >> 0.75422
> >>> 0.931 x2C 0.56139 0.807 0.967*
> >>>
> >>> ^ Why is it reporting correlations when I told it not to? And why is it
> >>> reporting the intercept variance as zero (which is wholly implausible)?
> >> And
> >>> why is it reporting a "random slope" for the reference category of x2?
> >> It's
> >>> the reference category, for crying out loud! It's not supposed to get
> an
> >>> estimate.
> >>>
> >>> Consultation of the lme4 manual
> >>> <https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf>
> (page
> >> 7)
> >>> suggests the following alternative syntax for specifying random slopes
> >>> uncorrelated with the random intercepts:
> >>>
> >>> *> VarCorr(glmer(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + (1|id) +
> >>> (0+x2|id), family = binomial, data = mydata, control =
> >>> glmerControl(optimizer = "optimx", optCtrl = list(method = "nlm")),
> nAGQ
> >> =
> >>> 1))*
> >>>
> >>> *boundary (singular) fit: see help('isSingular')*
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> * Groups Name Std.Dev. Corr id (Intercept) 0.00000
> >>> id.1 x2A 0.76331 x2B
> >> 0.75422
> >>> 0.931 x2C 0.56139 0.807 0.967*
> >>>
> >>> ^ The exact same strangeness persists. Correlations are being estimated
> >>> against my wishes, and there's a nonsensical parameter supposedly
> >>> ostensibly representing the reference category, plus an implausible
> zero
> >>> value reported on the random intercepts. What am I doing wrong?
> >>>
> >>> Best,
> >>>
> >>> Juho
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-mixed-models using r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> (Acting) Graduate chair, Mathematics & Statistics
> > E-mail is sent at my convenience; I don't expect replies outside of
> working hours.
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list