[R-sig-ME] Problem specifying uncorrelated random intercepts and slopes for a multi-df covariate
João Veríssimo
j|@ver|@@|mo @end|ng |rom gm@||@com
Fri Aug 5 16:57:34 CEST 2022
Including correlation parameters can "pull" random effects (so to speak)
in the direction of the estimated correlation. See here:
https://doingbayesiandataanalysis.blogspot.com/2019/07/shrinkage-in-hierarchical-models-random.html
So perhaps the slopes are changing quite a bit due to their correlation
with the intercept?
You could do some experimentation to test this, for example, keeping the
correlation between slopes, but removing the intercept-slope correlations.
Maybe something like this would do it: (1 | id) + (0 + x2B + x2C | id)
(that said, it does seem a bit strange to me that both of them are zero
when not estimating correlations)
João
On 05/08/2022 16:16, Juho Kristian Ruohonen wrote:
> Many thanks, Ben. While I find your example of singularity a bit hard to
> follow, I hope to have correctly identified a gist of "not enough data to
> observe/estimate between-cluster variation in the slopes of x2B and x2C."
>
> Even if that explains the zero estimates though, the fact that those
> estimates become non-zero after adding correlation parameters remains
> completely mystifying to me.
>
> I've now switched to a different file hosting service: hopefully my dataset
> <https://gofile.io/d/pS7O1Q> doesn't get deleted this time.
>
> Best,
>
> Juho
>
>
>
> pe 5. elok. 2022 klo 1.12 Ben Bolker (bbolker using gmail.com) kirjoitti:
>
>> I will take a look if I get a chance.
>>
>> "Singular" isn't quite the same as "inestimable", I think (although
>> to be honest I'm not sure what your definition of "inestimable" is). It
>> just means that the best estimate is on the boundary of the feasible
>> space. There are various more and less mathy ways of
>> restating/explaining that, one simple example is if you have a single
>> grouping variable; if true between-group variance is v_b and true
>> within-group variance is v_w, and there are N samples per group, the
>> expected *observed* among-group variation+ is v_b + v_w/n. If the sample
>> variance within groups is v'_w and the sample variance among groups is
>> LESS THAN v'_w/n, then your best conclusion is that the between-group
>> variance is negative! (or zero, if you're not allowing such
>> impossibilities).
>>
>> Singular fits are most common when the model is overfitted (too much
>> complexity/too much noise/not enough signal/not enough groups), but can
>> happen in many different circumstances.
>>
>> When I try to retrieve your data file the web page says it has been
>> deleted.
>>
>>
>>
>> On 2022-08-04 11:07 a.m., Juho Kristian Ruohonen wrote:
>>> Many thanks, Ben and João. I did as advised, converting x2 into two
>> dummies
>>> and specifying the random effects as *(x2B+x2C||id)*. This yields the
>>> correct number of estimated parameters (1 random intercepts, 2 random
>>> slopes). However, there's something I don't understand about the results.
>>>
>>> Firstly, there's a warning about a singular fit, which I take to mean
>> that
>>> some parameters are inestimable. Judging from the following output, I
>>> gather that it must be the 2 random slopes, which are estimated at
>>> essentially zero:
>>>
>>> *> summary(slopes.nocorr)*
>>> *...*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *Random effects: Groups Name Variance Std.Dev. id
>> (Intercept)
>>> 5.539e-01 7.443e-01 id.1 x2B 1.546e-14 1.243e-07 id.2 x2C
>>> 0.000e+00 0.000e+00Number of obs: 1405, groups: id, 292Fixed effects:
>>> Estimate Std. Error z value Pr(>|z|) (Intercept) -0.93057
>>> 0.27450 -3.390 0.000699 ***x1 0.51158 0.26550 1.927
>>> 0.053997 . x2B 2.54505 0.20936 12.156 < 2e-16 ***x2C
>>> 2.30179 0.30480 7.552 4.29e-14 ***x3 -0.77494 0.11660
>>> -6.646 3.01e-11 ***x4 0.24489 0.04957 4.940 7.80e-07
>> ***x5
>>> 0.28619 0.13810 2.072 0.038235 * x6 -1.07816
>>> 0.90224 -1.195 0.232091 x7 -0.67521 0.32810 -2.058
>>> 0.039595 * *
>>> *x8 -0.76275 0.28824 -2.646 0.008138 ** *
>>>
>>> It seems very strange that the random slopes should be inestimable: x2B
>> and
>>> x2C are not exceedingly scarce conditions: there are 277 observations of
>>> the former, 91 of the latter. There are 52 IDs with observations of both
>> xB
>>> = 1 and xB = 0. And there are 33 IDs with observations of both xC = 1 and
>>> xC = 0. So, I don't understand why the random slopes couldn't be
>> estimated.
>>> Stranger still, if I fit an otherwise identical model with *correlated*
>> random
>>> effects, the random-slope estimates suddenly do differ from 0 (although
>>> there's still a singularity warning). Like so:
>>>
>>> *> slopes.corr <- glmer(y ~ (x2B+x2C|id) + x1 + x2B + x2C + x3 + x4 +
>> x5
>>> + x6 + x7 + x8, family = binomial, data = mydata, control =
>>> glmerControl(optimizer = "optimx", optCtrl = list(method = "nlm")), nAGQ
>> =
>>> 1)*
>>> *...*
>>>
>>>
>>>
>>>
>>>
>>>
>>> *Random effects: Groups Name Variance Std.Dev. Corr id
>>> (Intercept) 0.5827 0.7633 x2B 0.0798
>>> 0.2825 -0.22 x2C 0.2060 0.4539 -0.68
>> 0.86Number
>>> of obs: 1405, groups: id, 292*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *Fixed effects: Estimate Std. Error z value Pr(>|z|)
>>> (Intercept) -0.98131 0.30416 -3.226 0.00125 ** x1
>> 0.56391
>>> 0.29692 1.899 0.05754 . x2B 2.56529 0.28735 8.928 <
>>> 2e-16 ***x2C 2.17394 0.41814 5.199 2.00e-07 ***x3
>>> -0.77852 0.11699 -6.655 2.84e-11 ***x4 0.24384 0.04982
>>> 4.895 9.84e-07 ***x5 0.28790 0.14005 2.056 0.03981 * x6
>>> -1.08438 0.91036 -1.191 0.23359 x7 -0.66753
>>> 0.32962 -2.025 0.04285 * x8 -0.75425 0.28913 -2.609
>>> 0.00909 ** *
>>>
>>> It boggles my mind that the 2 random slopes should be inestimable in the
>>> simpler model (with no correlation params) but somehow become estimable
>>> when you introduce 3 more parameters by allowing random-effect
>>> correlations. My brain has melted. Does anyone have a clue what's going
>> on?
>>> The anonymized datafile is available here <https://file.io/VKruszwJBwcK
>>> .
>>>
>>> Best,
>>>
>>> Juho
>>>
>>>
>>>
>>> to 4. elok. 2022 klo 0.30 João Veríssimo (jl.verissimo using gmail.com)
>> kirjoitti:
>>>> (1+x2 || id) is shorter notation for (1 | id) + (0 + x2 | id ).
>>>> And because x2 is a factor, suppressing the intercept leads to the
>>>> 'cell-mean coding' of x2: what is being estimated is the between-id
>>>> variation around the means of each level, A, B, and C (and their
>>>> correlation).
>>>>
>>>> In order to get what you want, turn x2 into two numeric variables
>>>> according to its contrasts. For example:
>>>> x2num1 <- ifelse(x2=="B", 1, 0)
>>>> x2num2 <- ifelse(x2=="C", 1, 0)
>>>>
>>>> Then (1 + x2num1 + x2num2 || id) will give you the random intercept, two
>>>> random slopes and no correlations.
>>>>
>>>> João
>>>>
>>>> On 03/08/2022 21:10, Juho Kristian Ruohonen wrote:
>>>>> Dear List,
>>>>>
>>>>> This is a logistic GLMM with 1 grouping factor + 8 fixed-effect
>>>> covariates.
>>>>> One of the fixed effects, namely x2, has three unordered categories.
>> This
>>>>> is the covariate for whose 2 non-reference categories I want to
>> estimate
>>>>> random slopes, along with the random intercepts with which I don't
>> expect
>>>>> the slopes to be correlated. But I fail:
>>>>>
>>>>>
>>>>>
>>>>> *> VarCorr(glmer(y ~ (x2||id) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8,
>>>>> family = binomial, data = mydata, control = glmerControl(optimizer =
>>>>> "optimx", optCtrl = list(method = "nlm")), nAGQ = 1))*
>>>>>
>>>>>
>>>>> *boundary (singular) fit: see help('isSingular')*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> * Groups Name Std.Dev. Corr id (Intercept) 0.00000
>>>>> id.1 x2A 0.76331 x2B
>>>> 0.75422
>>>>> 0.931 x2C 0.56139 0.807 0.967*
>>>>>
>>>>> ^ Why is it reporting correlations when I told it not to? And why is it
>>>>> reporting the intercept variance as zero (which is wholly implausible)?
>>>> And
>>>>> why is it reporting a "random slope" for the reference category of x2?
>>>> It's
>>>>> the reference category, for crying out loud! It's not supposed to get
>> an
>>>>> estimate.
>>>>>
>>>>> Consultation of the lme4 manual
>>>>> <https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf>
>> (page
>>>> 7)
>>>>> suggests the following alternative syntax for specifying random slopes
>>>>> uncorrelated with the random intercepts:
>>>>>
>>>>> *> VarCorr(glmer(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + (1|id) +
>>>>> (0+x2|id), family = binomial, data = mydata, control =
>>>>> glmerControl(optimizer = "optimx", optCtrl = list(method = "nlm")),
>> nAGQ
>>>> =
>>>>> 1))*
>>>>>
>>>>> *boundary (singular) fit: see help('isSingular')*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> * Groups Name Std.Dev. Corr id (Intercept) 0.00000
>>>>> id.1 x2A 0.76331 x2B
>>>> 0.75422
>>>>> 0.931 x2C 0.56139 0.807 0.967*
>>>>>
>>>>> ^ The exact same strangeness persists. Correlations are being estimated
>>>>> against my wishes, and there's a nonsensical parameter supposedly
>>>>> ostensibly representing the reference category, plus an implausible
>> zero
>>>>> value reported on the random intercepts. What am I doing wrong?
>>>>>
>>>>> Best,
>>>>>
>>>>> Juho
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> --
>> Dr. Benjamin Bolker
>> Professor, Mathematics & Statistics and Biology, McMaster University
>> Director, School of Computational Science and Engineering
>> (Acting) Graduate chair, Mathematics & Statistics
>> > E-mail is sent at my convenience; I don't expect replies outside of
>> working hours.
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list