[R-sig-ME] Problem specifying uncorrelated random intercepts and slopes for a multi-df covariate

Sun Aug 7 16:25:28 CEST 2022

Thanks João. That's a cool page with all the illustrations, although I
still can't claim to quite get it.

I'd assume that in order for the random slopes to correlate with the
intercepts, they would first have to be estimated at some non-zero value.
THEN I see how a correlation could arise and how the slopes could perhaps
be pulled towards that correlation. In the case at hand though, we have
random slopes that are essentially zero. Hence, I don't see how a
correlation with the intercepts could possibly arise. And without such a
correlation, I don't see how the slope estimates could be pulled towards
one.

I wonder if I'm missing something.

Best,

J

pe 5. elok. 2022 klo 17.57 João Veríssimo (jl.verissimo using gmail.com)
kirjoitti:

> Including correlation parameters can "pull" random effects (so to speak)
> in the direction of the estimated correlation. See here:
>
> https://doingbayesiandataanalysis.blogspot.com/2019/07/shrinkage-in-hierarchical-models-random.html
>
> So perhaps the slopes are changing quite a bit due to their correlation
> with the intercept?
> You could do some experimentation to test this, for example, keeping the
> correlation between slopes, but removing the intercept-slope correlations.
> Maybe something like this would do it: (1 | id) + (0 + x2B + x2C | id)
>
> (that said, it does seem a bit strange to me that both of them are zero
> when not estimating correlations)
>
> João
>
> On 05/08/2022 16:16, Juho Kristian Ruohonen wrote:
> > Many thanks, Ben. While I find your example of singularity a bit hard to
> > follow, I hope to have correctly identified a gist of "not enough data to
> > observe/estimate between-cluster variation in the slopes of x2B and x2C."
> >
> > Even if that explains the zero estimates though, the fact that those
> > estimates become non-zero after adding correlation parameters remains
> > completely mystifying to me.
> >
> > I've now switched to a different file hosting service: hopefully my
> dataset
> > <https://gofile.io/d/pS7O1Q> doesn't get deleted this time.
> >
> > Best,
> >
> > Juho
> >
> >
> >
> > pe 5. elok. 2022 klo 1.12 Ben Bolker (bbolker using gmail.com) kirjoitti:
> >
> >>     I will take a look if I get a chance.
> >>
> >>     "Singular" isn't quite the same as "inestimable", I think (although
> >> to be honest I'm not sure what your definition of "inestimable" is). It
> >> just means that the best estimate is on the boundary of the feasible
> >> space. There are various more and less mathy ways of
> >> restating/explaining that, one simple example is if you have a single
> >> grouping variable; if true between-group variance is v_b and true
> >> within-group variance is v_w, and there are N samples per group, the
> >> expected *observed* among-group variation+ is v_b + v_w/n. If the sample
> >> variance within groups is v'_w and the sample variance among groups is
> >> LESS THAN v'_w/n, then your best conclusion is that the between-group
> >> variance is negative! (or zero, if you're not allowing such
> >> impossibilities).
> >>
> >>     Singular fits are most common when the model is overfitted (too much
> >> complexity/too much noise/not enough signal/not enough groups), but can
> >> happen in many different circumstances.
> >>
> >>     When I try to retrieve your data file the web page says it has been
> >> deleted.
> >>
> >>
> >>
> >> On 2022-08-04 11:07 a.m., Juho Kristian Ruohonen wrote:
> >>> Many thanks, Ben and João. I did as advised, converting x2 into two
> >> dummies
> >>> and specifying the random effects as *(x2B+x2C||id)*. This yields the
> >>> correct number of estimated parameters (1 random intercepts, 2 random
> >>> slopes). However, there's something I don't understand about the
> results.
> >>>
> >>> Firstly, there's a warning about a singular fit, which I take to mean
> >> that
> >>> some parameters are inestimable. Judging from the following output, I
> >>> gather that it must be the 2 random slopes, which are estimated at
> >>> essentially zero:
> >>>
> >>> *> summary(slopes.nocorr)*
> >>> *...*
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> *Random effects: Groups Name        Variance  Std.Dev.  id
> >>   (Intercept)
> >>> 5.539e-01 7.443e-01 id.1   x2B         1.546e-14 1.243e-07 id.2   x2C
> >>>     0.000e+00 0.000e+00Number of obs: 1405, groups:  id, 292Fixed
> effects:
> >>>           Estimate Std. Error z value Pr(>|z|)    (Intercept) -0.93057
> >>>    0.27450  -3.390 0.000699 ***x1           0.51158    0.26550   1.927
> >>> 0.053997 .  x2B          2.54505    0.20936  12.156  < 2e-16 ***x2C
> >>>    2.30179    0.30480   7.552 4.29e-14 ***x3          -0.77494
> 0.11660
> >>>    -6.646 3.01e-11 ***x4           0.24489    0.04957   4.940 7.80e-07
> >> ***x5
> >>>             0.28619    0.13810   2.072 0.038235 *  x6          -1.07816
> >>>    0.90224  -1.195 0.232091    x7          -0.67521    0.32810  -2.058
> >>> 0.039595 *  *
> >>> *x8          -0.76275    0.28824  -2.646 0.008138 ** *
> >>>
> >>> It seems very strange that the random slopes should be inestimable: x2B
> >> and
> >>> x2C are not exceedingly scarce conditions: there are 277 observations
> of
> >>> the former, 91 of the latter. There are 52 IDs with observations of
> both
> >> xB
> >>> = 1 and xB = 0. And there are 33 IDs with observations of both xC = 1
> and
> >>> xC = 0. So, I don't understand why the random slopes couldn't be
> >> estimated.
> >>> Stranger still, if I fit an otherwise identical model with *correlated*
> >> random
> >>> effects, the random-slope estimates suddenly do differ from 0 (although
> >>> there's still a singularity warning). Like so:
> >>>
> >>> *> slopes.corr <-   glmer(y ~ (x2B+x2C|id)  + x1 + x2B + x2C + x3 + x4
> +
> >> x5
> >>> + x6 + x7 + x8, family = binomial, data = mydata, control =
> >>> glmerControl(optimizer = "optimx", optCtrl = list(method = "nlm")),
> nAGQ
> >> =
> >>> 1)*
> >>> *...*
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> *Random effects: Groups Name        Variance Std.Dev. Corr        id
> >>> (Intercept) 0.5827   0.7633                      x2B         0.0798
> >>> 0.2825   -0.22              x2C         0.2060   0.4539   -0.68
> >> 0.86Number
> >>> of obs: 1405, groups:  id, 292*
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> *Fixed effects:            Estimate Std. Error z value Pr(>|z|)
> >>>    (Intercept) -0.98131    0.30416  -3.226  0.00125 ** x1
> >>   0.56391
> >>>    0.29692   1.899  0.05754 .  x2B          2.56529    0.28735
>  8.928  <
> >>> 2e-16 ***x2C          2.17394    0.41814   5.199 2.00e-07 ***x3
> >>>    -0.77852    0.11699  -6.655 2.84e-11 ***x4           0.24384
> 0.04982
> >>> 4.895 9.84e-07 ***x5           0.28790    0.14005   2.056  0.03981 *
> x6
> >>>          -1.08438    0.91036  -1.191  0.23359    x7          -0.66753
> >>>    0.32962  -2.025  0.04285 *  x8          -0.75425    0.28913  -2.609
> >>>    0.00909 ** *
> >>>
> >>> It boggles my mind that the 2 random slopes should be inestimable in
> the
> >>> simpler model (with no correlation params) but somehow become estimable
> >>> when you introduce 3 more parameters by allowing random-effect
> >>> correlations. My brain has melted. Does anyone have a clue what's going
> >> on?
> >>> The anonymized datafile is available here <
> https://file.io/VKruszwJBwcK
> >>> .
> >>>
> >>> Best,
> >>>
> >>> Juho
> >>>
> >>>
> >>>
> >>> to 4. elok. 2022 klo 0.30 João Veríssimo (jl.verissimo using gmail.com)
> >> kirjoitti:
> >>>> (1+x2 || id) is shorter notation for (1 | id) + (0 + x2 | id ).
> >>>> And because x2 is a factor, suppressing the intercept leads to the
> >>>> 'cell-mean coding' of x2: what is being estimated is the between-id
> >>>> variation around the means of each level, A, B, and C (and their
> >>>> correlation).
> >>>>
> >>>> In order to get what you want, turn x2 into two numeric variables
> >>>> according to its contrasts. For example:
> >>>> x2num1 <- ifelse(x2=="B", 1, 0)
> >>>> x2num2 <- ifelse(x2=="C", 1, 0)
> >>>>
> >>>> Then (1 + x2num1 + x2num2 || id) will give you the random intercept,
> two
> >>>> random slopes and no correlations.
> >>>>
> >>>> João
> >>>>
> >>>> On 03/08/2022 21:10, Juho Kristian Ruohonen wrote:
> >>>>> Dear List,
> >>>>>
> >>>>> This is a logistic GLMM with 1 grouping factor + 8 fixed-effect
> >>>> covariates.
> >>>>> One of the fixed effects, namely x2, has three unordered categories.
> >> This
> >>>>> is the covariate for whose 2 non-reference categories I want to
> >> estimate
> >>>>> random slopes, along with the random intercepts with which I don't
> >> expect
> >>>>> the slopes to be correlated. But I fail:
> >>>>>
> >>>>>
> >>>>>
> >>>>> *> VarCorr(glmer(y ~ (x2||id) + x1 + x2 + x3 + x4 + x5 + x6 + x7 +
> x8,
> >>>>> family = binomial, data = mydata, control = glmerControl(optimizer =
> >>>>> "optimx", optCtrl = list(method = "nlm")), nAGQ = 1))*
> >>>>>
> >>>>>
> >>>>> *boundary (singular) fit: see help('isSingular')*
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> * Groups Name        Std.Dev. Corr        id     (Intercept) 0.00000
> >>>>>           id.1   x2A         0.76331                     x2B
> >>>>    0.75422
> >>>>>     0.931              x2C         0.56139  0.807 0.967*
> >>>>>
> >>>>> ^ Why is it reporting correlations when I told it not to? And why is
> it
> >>>>> reporting the intercept variance as zero (which is wholly
> implausible)?
> >>>> And
> >>>>> why is it reporting a "random slope" for the reference category of
> x2?
> >>>> It's
> >>>>> the reference category, for crying out loud! It's not supposed to get
> >> an
> >>>>> estimate.
> >>>>>
> >>>>> Consultation of the lme4 manual
> >>>>> <https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf>
> >> (page
> >>>> 7)
> >>>>> suggests the following alternative syntax for specifying random
> slopes
> >>>>> uncorrelated with the random intercepts:
> >>>>>
> >>>>> *> VarCorr(glmer(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + (1|id) +
> >>>>> (0+x2|id), family = binomial, data = mydata, control =
> >>>>> glmerControl(optimizer = "optimx", optCtrl = list(method = "nlm")),
> >> nAGQ
> >>>> =
> >>>>> 1))*
> >>>>>
> >>>>> *boundary (singular) fit: see help('isSingular')*
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> * Groups Name        Std.Dev. Corr        id     (Intercept) 0.00000
> >>>>>           id.1   x2A         0.76331                     x2B
> >>>>    0.75422
> >>>>>     0.931              x2C         0.56139  0.807 0.967*
> >>>>>
> >>>>> ^ The exact same strangeness persists. Correlations are being
> estimated
> >>>>> against my wishes, and there's a nonsensical parameter supposedly
> >>>>> ostensibly representing the reference category, plus an implausible
> >> zero
> >>>>> value reported on the random intercepts. What am I doing wrong?
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Juho
> >>>>>
> >>>>>         [[alternative HTML version deleted]]
> >>>>>
> >>>>> _______________________________________________
> >>>>> R-sig-mixed-models using r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>
> >>>        [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-mixed-models using r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >> --
> >> Dr. Benjamin Bolker
> >> Professor, Mathematics & Statistics and Biology, McMaster University
> >> Director, School of Computational Science and Engineering
> >> (Acting) Graduate chair, Mathematics & Statistics
> >>   > E-mail is sent at my convenience; I don't expect replies outside of
> >> working hours.
> >>
> >> _______________________________________________
> >> R-sig-mixed-models using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>

	[[alternative HTML version deleted]]