[R-sig-ME] Problem specifying uncorrelated random intercepts and slopes for a multi-df covariate

Fri Aug 5 16:16:54 CEST 2022

Many thanks, Ben. While I find your example of singularity a bit hard to
follow, I hope to have correctly identified a gist of "not enough data to
observe/estimate between-cluster variation in the slopes of x2B and x2C."

Even if that explains the zero estimates though, the fact that those
estimates become non-zero after adding correlation parameters remains
completely mystifying to me.

I've now switched to a different file hosting service: hopefully my dataset
<https://gofile.io/d/pS7O1Q> doesn't get deleted this time.

Best,

Juho

pe 5. elok. 2022 klo 1.12 Ben Bolker (bbolker using gmail.com) kirjoitti:

>    I will take a look if I get a chance.
>
>    "Singular" isn't quite the same as "inestimable", I think (although
> to be honest I'm not sure what your definition of "inestimable" is). It
> just means that the best estimate is on the boundary of the feasible
> space. There are various more and less mathy ways of
> restating/explaining that, one simple example is if you have a single
> grouping variable; if true between-group variance is v_b and true
> within-group variance is v_w, and there are N samples per group, the
> expected *observed* among-group variation+ is v_b + v_w/n. If the sample
> variance within groups is v'_w and the sample variance among groups is
> LESS THAN v'_w/n, then your best conclusion is that the between-group
> variance is negative! (or zero, if you're not allowing such
> impossibilities).
>
>    Singular fits are most common when the model is overfitted (too much
> complexity/too much noise/not enough signal/not enough groups), but can
> happen in many different circumstances.
>
>    When I try to retrieve your data file the web page says it has been
> deleted.
>
>
>
> On 2022-08-04 11:07 a.m., Juho Kristian Ruohonen wrote:
> > Many thanks, Ben and João. I did as advised, converting x2 into two
> dummies
> > and specifying the random effects as *(x2B+x2C||id)*. This yields the
> > correct number of estimated parameters (1 random intercepts, 2 random
> > slopes). However, there's something I don't understand about the results.
> >
> > Firstly, there's a warning about a singular fit, which I take to mean
> that
> > some parameters are inestimable. Judging from the following output, I
> > gather that it must be the 2 random slopes, which are estimated at
> > essentially zero:
> >
> > *> summary(slopes.nocorr)*
> > *...*
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *Random effects: Groups Name        Variance  Std.Dev.  id
>  (Intercept)
> > 5.539e-01 7.443e-01 id.1   x2B         1.546e-14 1.243e-07 id.2   x2C
> >    0.000e+00 0.000e+00Number of obs: 1405, groups:  id, 292Fixed effects:
> >          Estimate Std. Error z value Pr(>|z|)    (Intercept) -0.93057
> >   0.27450  -3.390 0.000699 ***x1           0.51158    0.26550   1.927
> > 0.053997 .  x2B          2.54505    0.20936  12.156  < 2e-16 ***x2C
> >   2.30179    0.30480   7.552 4.29e-14 ***x3          -0.77494    0.11660
> >   -6.646 3.01e-11 ***x4           0.24489    0.04957   4.940 7.80e-07
> ***x5
> >            0.28619    0.13810   2.072 0.038235 *  x6          -1.07816
> >   0.90224  -1.195 0.232091    x7          -0.67521    0.32810  -2.058
> > 0.039595 *  *
> > *x8          -0.76275    0.28824  -2.646 0.008138 ** *
> >
> > It seems very strange that the random slopes should be inestimable: x2B
> and
> > x2C are not exceedingly scarce conditions: there are 277 observations of
> > the former, 91 of the latter. There are 52 IDs with observations of both
> xB
> > = 1 and xB = 0. And there are 33 IDs with observations of both xC = 1 and
> > xC = 0. So, I don't understand why the random slopes couldn't be
> estimated.
> >
> > Stranger still, if I fit an otherwise identical model with *correlated*
> random
> > effects, the random-slope estimates suddenly do differ from 0 (although
> > there's still a singularity warning). Like so:
> >
> > *> slopes.corr <-   glmer(y ~ (x2B+x2C|id)  + x1 + x2B + x2C + x3 + x4 +
> x5
> > + x6 + x7 + x8, family = binomial, data = mydata, control =
> > glmerControl(optimizer = "optimx", optCtrl = list(method = "nlm")), nAGQ
> =
> > 1)*
> > *...*
> >
> >
> >
> >
> >
> >
> > *Random effects: Groups Name        Variance Std.Dev. Corr        id
> > (Intercept) 0.5827   0.7633                      x2B         0.0798
> > 0.2825   -0.22              x2C         0.2060   0.4539   -0.68
> 0.86Number
> > of obs: 1405, groups:  id, 292*
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *Fixed effects:            Estimate Std. Error z value Pr(>|z|)
> >   (Intercept) -0.98131    0.30416  -3.226  0.00125 ** x1
>  0.56391
> >   0.29692   1.899  0.05754 .  x2B          2.56529    0.28735   8.928  <
> > 2e-16 ***x2C          2.17394    0.41814   5.199 2.00e-07 ***x3
> >   -0.77852    0.11699  -6.655 2.84e-11 ***x4           0.24384    0.04982
> > 4.895 9.84e-07 ***x5           0.28790    0.14005   2.056  0.03981 *  x6
> >         -1.08438    0.91036  -1.191  0.23359    x7          -0.66753
> >   0.32962  -2.025  0.04285 *  x8          -0.75425    0.28913  -2.609
> >   0.00909 ** *
> >
> > It boggles my mind that the 2 random slopes should be inestimable in the
> > simpler model (with no correlation params) but somehow become estimable
> > when you introduce 3 more parameters by allowing random-effect
> > correlations. My brain has melted. Does anyone have a clue what's going
> on?
> > The anonymized datafile is available here <https://file.io/VKruszwJBwcK
> >.
> >
> > Best,
> >
> > Juho
> >
> >
> >
> > to 4. elok. 2022 klo 0.30 João Veríssimo (jl.verissimo using gmail.com)
> kirjoitti:
> >
> >> (1+x2 || id) is shorter notation for (1 | id) + (0 + x2 | id ).
> >> And because x2 is a factor, suppressing the intercept leads to the
> >> 'cell-mean coding' of x2: what is being estimated is the between-id
> >> variation around the means of each level, A, B, and C (and their
> >> correlation).
> >>
> >> In order to get what you want, turn x2 into two numeric variables
> >> according to its contrasts. For example:
> >> x2num1 <- ifelse(x2=="B", 1, 0)
> >> x2num2 <- ifelse(x2=="C", 1, 0)
> >>
> >> Then (1 + x2num1 + x2num2 || id) will give you the random intercept, two
> >> random slopes and no correlations.
> >>
> >> João
> >>
> >> On 03/08/2022 21:10, Juho Kristian Ruohonen wrote:
> >>> Dear List,
> >>>
> >>> This is a logistic GLMM with 1 grouping factor + 8 fixed-effect
> >> covariates.
> >>> One of the fixed effects, namely x2, has three unordered categories.
> This
> >>> is the covariate for whose 2 non-reference categories I want to
> estimate
> >>> random slopes, along with the random intercepts with which I don't
> expect
> >>> the slopes to be correlated. But I fail:
> >>>
> >>>
> >>>
> >>> *> VarCorr(glmer(y ~ (x2||id) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8,
> >>> family = binomial, data = mydata, control = glmerControl(optimizer =
> >>> "optimx", optCtrl = list(method = "nlm")), nAGQ = 1))*
> >>>
> >>>
> >>> *boundary (singular) fit: see help('isSingular')*
> >>>
> >>>
> >>>
> >>>
> >>> * Groups Name        Std.Dev. Corr        id     (Intercept) 0.00000
> >>>          id.1   x2A         0.76331                     x2B
> >>   0.75422
> >>>    0.931              x2C         0.56139  0.807 0.967*
> >>>
> >>> ^ Why is it reporting correlations when I told it not to? And why is it
> >>> reporting the intercept variance as zero (which is wholly implausible)?
> >> And
> >>> why is it reporting a "random slope" for the reference category of x2?
> >> It's
> >>> the reference category, for crying out loud! It's not supposed to get
> an
> >>> estimate.
> >>>
> >>> Consultation of the lme4 manual
> >>> <https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf>
> (page
> >> 7)
> >>> suggests the following alternative syntax for specifying random slopes
> >>> uncorrelated with the random intercepts:
> >>>
> >>> *> VarCorr(glmer(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + (1|id) +
> >>> (0+x2|id), family = binomial, data = mydata, control =
> >>> glmerControl(optimizer = "optimx", optCtrl = list(method = "nlm")),
> nAGQ
> >> =
> >>> 1))*
> >>>
> >>> *boundary (singular) fit: see help('isSingular')*
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> * Groups Name        Std.Dev. Corr        id     (Intercept) 0.00000
> >>>          id.1   x2A         0.76331                     x2B
> >>   0.75422
> >>>    0.931              x2C         0.56139  0.807 0.967*
> >>>
> >>> ^ The exact same strangeness persists. Correlations are being estimated
> >>> against my wishes, and there's a nonsensical parameter supposedly
> >>> ostensibly representing the reference category, plus an implausible
> zero
> >>> value reported on the random intercepts. What am I doing wrong?
> >>>
> >>> Best,
> >>>
> >>> Juho
> >>>
> >>>        [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-mixed-models using r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> (Acting) Graduate chair, Mathematics & Statistics
>  > E-mail is sent at my convenience; I don't expect replies outside of
> working hours.
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]