[R-sig-ME] Problem specifying uncorrelated random intercepts and slopes for a multi-df covariate

João Veríssimo j|@ver|@@|mo @end|ng |rom gm@||@com
Fri Aug 5 16:57:34 CEST 2022


Including correlation parameters can "pull" random effects (so to speak) 
in the direction of the estimated correlation. See here:
https://doingbayesiandataanalysis.blogspot.com/2019/07/shrinkage-in-hierarchical-models-random.html

So perhaps the slopes are changing quite a bit due to their correlation 
with the intercept?
You could do some experimentation to test this, for example, keeping the 
correlation between slopes, but removing the intercept-slope correlations.
Maybe something like this would do it: (1 | id) + (0 + x2B + x2C | id)

(that said, it does seem a bit strange to me that both of them are zero 
when not estimating correlations)

João

On 05/08/2022 16:16, Juho Kristian Ruohonen wrote:
> Many thanks, Ben. While I find your example of singularity a bit hard to
> follow, I hope to have correctly identified a gist of "not enough data to
> observe/estimate between-cluster variation in the slopes of x2B and x2C."
>
> Even if that explains the zero estimates though, the fact that those
> estimates become non-zero after adding correlation parameters remains
> completely mystifying to me.
>
> I've now switched to a different file hosting service: hopefully my dataset
> <https://gofile.io/d/pS7O1Q> doesn't get deleted this time.
>
> Best,
>
> Juho
>
>
>
> pe 5. elok. 2022 klo 1.12 Ben Bolker (bbolker using gmail.com) kirjoitti:
>
>>     I will take a look if I get a chance.
>>
>>     "Singular" isn't quite the same as "inestimable", I think (although
>> to be honest I'm not sure what your definition of "inestimable" is). It
>> just means that the best estimate is on the boundary of the feasible
>> space. There are various more and less mathy ways of
>> restating/explaining that, one simple example is if you have a single
>> grouping variable; if true between-group variance is v_b and true
>> within-group variance is v_w, and there are N samples per group, the
>> expected *observed* among-group variation+ is v_b + v_w/n. If the sample
>> variance within groups is v'_w and the sample variance among groups is
>> LESS THAN v'_w/n, then your best conclusion is that the between-group
>> variance is negative! (or zero, if you're not allowing such
>> impossibilities).
>>
>>     Singular fits are most common when the model is overfitted (too much
>> complexity/too much noise/not enough signal/not enough groups), but can
>> happen in many different circumstances.
>>
>>     When I try to retrieve your data file the web page says it has been
>> deleted.
>>
>>
>>
>> On 2022-08-04 11:07 a.m., Juho Kristian Ruohonen wrote:
>>> Many thanks, Ben and João. I did as advised, converting x2 into two
>> dummies
>>> and specifying the random effects as *(x2B+x2C||id)*. This yields the
>>> correct number of estimated parameters (1 random intercepts, 2 random
>>> slopes). However, there's something I don't understand about the results.
>>>
>>> Firstly, there's a warning about a singular fit, which I take to mean
>> that
>>> some parameters are inestimable. Judging from the following output, I
>>> gather that it must be the 2 random slopes, which are estimated at
>>> essentially zero:
>>>
>>> *> summary(slopes.nocorr)*
>>> *...*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *Random effects: Groups Name        Variance  Std.Dev.  id
>>   (Intercept)
>>> 5.539e-01 7.443e-01 id.1   x2B         1.546e-14 1.243e-07 id.2   x2C
>>>     0.000e+00 0.000e+00Number of obs: 1405, groups:  id, 292Fixed effects:
>>>           Estimate Std. Error z value Pr(>|z|)    (Intercept) -0.93057
>>>    0.27450  -3.390 0.000699 ***x1           0.51158    0.26550   1.927
>>> 0.053997 .  x2B          2.54505    0.20936  12.156  < 2e-16 ***x2C
>>>    2.30179    0.30480   7.552 4.29e-14 ***x3          -0.77494    0.11660
>>>    -6.646 3.01e-11 ***x4           0.24489    0.04957   4.940 7.80e-07
>> ***x5
>>>             0.28619    0.13810   2.072 0.038235 *  x6          -1.07816
>>>    0.90224  -1.195 0.232091    x7          -0.67521    0.32810  -2.058
>>> 0.039595 *  *
>>> *x8          -0.76275    0.28824  -2.646 0.008138 ** *
>>>
>>> It seems very strange that the random slopes should be inestimable: x2B
>> and
>>> x2C are not exceedingly scarce conditions: there are 277 observations of
>>> the former, 91 of the latter. There are 52 IDs with observations of both
>> xB
>>> = 1 and xB = 0. And there are 33 IDs with observations of both xC = 1 and
>>> xC = 0. So, I don't understand why the random slopes couldn't be
>> estimated.
>>> Stranger still, if I fit an otherwise identical model with *correlated*
>> random
>>> effects, the random-slope estimates suddenly do differ from 0 (although
>>> there's still a singularity warning). Like so:
>>>
>>> *> slopes.corr <-   glmer(y ~ (x2B+x2C|id)  + x1 + x2B + x2C + x3 + x4 +
>> x5
>>> + x6 + x7 + x8, family = binomial, data = mydata, control =
>>> glmerControl(optimizer = "optimx", optCtrl = list(method = "nlm")), nAGQ
>> =
>>> 1)*
>>> *...*
>>>
>>>
>>>
>>>
>>>
>>>
>>> *Random effects: Groups Name        Variance Std.Dev. Corr        id
>>> (Intercept) 0.5827   0.7633                      x2B         0.0798
>>> 0.2825   -0.22              x2C         0.2060   0.4539   -0.68
>> 0.86Number
>>> of obs: 1405, groups:  id, 292*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *Fixed effects:            Estimate Std. Error z value Pr(>|z|)
>>>    (Intercept) -0.98131    0.30416  -3.226  0.00125 ** x1
>>   0.56391
>>>    0.29692   1.899  0.05754 .  x2B          2.56529    0.28735   8.928  <
>>> 2e-16 ***x2C          2.17394    0.41814   5.199 2.00e-07 ***x3
>>>    -0.77852    0.11699  -6.655 2.84e-11 ***x4           0.24384    0.04982
>>> 4.895 9.84e-07 ***x5           0.28790    0.14005   2.056  0.03981 *  x6
>>>          -1.08438    0.91036  -1.191  0.23359    x7          -0.66753
>>>    0.32962  -2.025  0.04285 *  x8          -0.75425    0.28913  -2.609
>>>    0.00909 ** *
>>>
>>> It boggles my mind that the 2 random slopes should be inestimable in the
>>> simpler model (with no correlation params) but somehow become estimable
>>> when you introduce 3 more parameters by allowing random-effect
>>> correlations. My brain has melted. Does anyone have a clue what's going
>> on?
>>> The anonymized datafile is available here <https://file.io/VKruszwJBwcK
>>> .
>>>
>>> Best,
>>>
>>> Juho
>>>
>>>
>>>
>>> to 4. elok. 2022 klo 0.30 João Veríssimo (jl.verissimo using gmail.com)
>> kirjoitti:
>>>> (1+x2 || id) is shorter notation for (1 | id) + (0 + x2 | id ).
>>>> And because x2 is a factor, suppressing the intercept leads to the
>>>> 'cell-mean coding' of x2: what is being estimated is the between-id
>>>> variation around the means of each level, A, B, and C (and their
>>>> correlation).
>>>>
>>>> In order to get what you want, turn x2 into two numeric variables
>>>> according to its contrasts. For example:
>>>> x2num1 <- ifelse(x2=="B", 1, 0)
>>>> x2num2 <- ifelse(x2=="C", 1, 0)
>>>>
>>>> Then (1 + x2num1 + x2num2 || id) will give you the random intercept, two
>>>> random slopes and no correlations.
>>>>
>>>> João
>>>>
>>>> On 03/08/2022 21:10, Juho Kristian Ruohonen wrote:
>>>>> Dear List,
>>>>>
>>>>> This is a logistic GLMM with 1 grouping factor + 8 fixed-effect
>>>> covariates.
>>>>> One of the fixed effects, namely x2, has three unordered categories.
>> This
>>>>> is the covariate for whose 2 non-reference categories I want to
>> estimate
>>>>> random slopes, along with the random intercepts with which I don't
>> expect
>>>>> the slopes to be correlated. But I fail:
>>>>>
>>>>>
>>>>>
>>>>> *> VarCorr(glmer(y ~ (x2||id) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8,
>>>>> family = binomial, data = mydata, control = glmerControl(optimizer =
>>>>> "optimx", optCtrl = list(method = "nlm")), nAGQ = 1))*
>>>>>
>>>>>
>>>>> *boundary (singular) fit: see help('isSingular')*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> * Groups Name        Std.Dev. Corr        id     (Intercept) 0.00000
>>>>>           id.1   x2A         0.76331                     x2B
>>>>    0.75422
>>>>>     0.931              x2C         0.56139  0.807 0.967*
>>>>>
>>>>> ^ Why is it reporting correlations when I told it not to? And why is it
>>>>> reporting the intercept variance as zero (which is wholly implausible)?
>>>> And
>>>>> why is it reporting a "random slope" for the reference category of x2?
>>>> It's
>>>>> the reference category, for crying out loud! It's not supposed to get
>> an
>>>>> estimate.
>>>>>
>>>>> Consultation of the lme4 manual
>>>>> <https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf>
>> (page
>>>> 7)
>>>>> suggests the following alternative syntax for specifying random slopes
>>>>> uncorrelated with the random intercepts:
>>>>>
>>>>> *> VarCorr(glmer(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + (1|id) +
>>>>> (0+x2|id), family = binomial, data = mydata, control =
>>>>> glmerControl(optimizer = "optimx", optCtrl = list(method = "nlm")),
>> nAGQ
>>>> =
>>>>> 1))*
>>>>>
>>>>> *boundary (singular) fit: see help('isSingular')*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> * Groups Name        Std.Dev. Corr        id     (Intercept) 0.00000
>>>>>           id.1   x2A         0.76331                     x2B
>>>>    0.75422
>>>>>     0.931              x2C         0.56139  0.807 0.967*
>>>>>
>>>>> ^ The exact same strangeness persists. Correlations are being estimated
>>>>> against my wishes, and there's a nonsensical parameter supposedly
>>>>> ostensibly representing the reference category, plus an implausible
>> zero
>>>>> value reported on the random intercepts. What am I doing wrong?
>>>>>
>>>>> Best,
>>>>>
>>>>> Juho
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> --
>> Dr. Benjamin Bolker
>> Professor, Mathematics & Statistics and Biology, McMaster University
>> Director, School of Computational Science and Engineering
>> (Acting) Graduate chair, Mathematics & Statistics
>>   > E-mail is sent at my convenience; I don't expect replies outside of
>> working hours.
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



More information about the R-sig-mixed-models mailing list