[R-sig-ME] Perfectly correlated random effects (when they shouldn't be)

Wed Jul 15 23:18:23 CEST 2015

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 15-07-15 04:38 PM, Ben Bolker wrote:
> On 15-07-15 12:52 PM, Paul Buerkner wrote:
>> if you look at the results from a baysian perspective, it seems
>> to be a typcial "problem" of ML-procedures estimating the mode.
>> 
>> The mode is nothing special, just the point where the density is
>> maximal. When you have skewed distribution (as usual for
>> correlations) the mode will often be close to the borders of the
>> region of definition (-1 or 1 in this case). The posterior
>> distribution of the correlation, however, can still be very wide
>> ranging from strong negative correlation to strong positive 
>> correlation, especially when the number of levels of a grouping
>> factor is not that large. In those cases, zero (i.e.
>> insignificant) correlation is a very likely value even if the
>> mode itself is extreme.
>> 
>> I tried fitting your models with bayesian R packages (brms and
>> MCMCglmm). Unfortunately, because you have so many observations
>> and quite a few random effects, they run relatively slow so i am
>> still waiting for the results.
> 

 You can also use blme, which implements a very thin Bayesian wrapper
around [g]lmer and does maximum _a posteriori_ (i.e. Bayesian mode)
estimates with
weak (but principled) priors on the random effects -- it's based on

Chung, Yeojin, Sophia Rabe-Hesketh, Vincent Dorie, Andrew Gelman, and
Jingchen Liu. “A Nondegenerate Penalized Likelihood Estimator for
Variance Parameters in Multilevel Models.” Psychometrika 78, no. 4
(March 12, 2013): 685–709. doi:10.1007/s11336-013-9328-2.

  Profile 'likelihood' confidence intervals based on blme will get you
a reasonable approximation of the width of the credible interval,
although it's a little bit of a cheesy/awkward combination between
marginal (proper Bayesian) and conditional (MAP/cheesy-Bayesian)
measures of uncertainty.

>> 2015-07-15 3:45 GMT+02:00 svm <steven.v.miller at gmail.com>:
>> 
>>> I considered that. I disaggregated the region random effect
>>> from 6 to 18 (the latter of which approximates the World Bank's
>>> region classification). I'm still encountering the same curious
>>> issue.
>>> 
>>> Random effects: Groups       Name        Variance  Std.Dev.
>>> Corr country:wave (Intercept) 0.1530052 0.39116 country
>>> (Intercept) 0.3735876 0.61122 wbregion     (Intercept)
>>> 0.0137822 0.11740 x1        0.0009384 0.03063  -1.00 x2
>>> 0.0767387 0.27702  -1.00  1.00 Number of obs: 212570, groups:
>>> country:wave, 143; country, 82; wbregion, 18
>>> 
>>> For what it's worth: the model estimates fine. The results are
>>> intuitive and theoretically consistent. They also don't change
>>> if I were to remove that region random effect. I'd like to keep
>>> the region random effect (with varying slopes) in the model.
>>> I'm struggling with what I should think about the perfect
>>> correlations.
>>> 
>>> On Tue, Jul 14, 2015 at 9:07 PM, Jake Westfall
>>> <jake987722 at hotmail.com> wrote:
>>> 
>>>> Hi Steve,
>>>> 
>>>> 
>>>> I think the issue is that estimating 3 variances and 3
>>>> covariances for regions is quite ambitious given that there
>>>> are only 6 regions. I think it's not surprising that the
>>>> model has a hard time getting good estimates of those
>>>> parameters.
>>>> 
>>>> 
>>>> Jake
>>>> 
>>>>> Date: Tue, 14 Jul 2015 20:53:01 -0400 From:
>>>>> steven.v.miller at gmail.com To:
>>>>> r-sig-mixed-models at r-project.org Subject: [R-sig-ME]
>>>>> Perfectly correlated random effects (when they
>>>> shouldn't be)
>>>> 
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I'm a long-time reader and wanted to raise a question I've
>>>>> seen asked
>>>> here
>>>>> before about correlated random effects. Past answers I have
>>>>> encountered
>>>> on
>>>>> this listserv explain that perfectly correlated random
>>>>> effects suggest model overfitting and variances of random
>>>>> effects that are effectively
>>>> zero
>>>>> and can be omitted for a simpler model. In my case, I don't
>>>>> think
>>> that's
>>>>> what is happening here, though I could well be fitting a
>>>>> poor model in glmer.
>>>>> 
>>>>> I'll describe the nature of the data first. I'm modeling
>>> individual-level
>>>>> survey data for countries across multiple waves and am
>>>>> estimating the region of the globe as a random effect as
>>>>> well. I have three random
>>>> effects
>>>>> (country, country-wave, and region). In the region random
>>>>> effect, I am allowing country-wave-level predictors to have
>>>>> varying slopes. My
>>> inquiry
>>>>> is whether some country-wave-level contextual indicator can
>>>>> have an
>>>> overall
>>>>> effect (as a fixed effect), the effect of which can vary by
>>>>> region. In other words: is the effect of some country-level
>>>>> indicator (e.g. unemployment) in a given year different for
>>>>> countries in Western Europe than for countries in Africa
>>>>> even if, on average, there is a positive
>>> or
>>>>> negative association at the individual-level? These
>>>>> country-wave-level predictors that I allow to vary by
>>>>> region are the ones reporting
>>> perfect
>>>>> correlation and I'm unsure how to interpret that (or if I'm
>>>>> estimating
>>>> the
>>>>> model correctly).
>>>>> 
>>>>> I should also add that I have individual-level predictors
>>>>> as well as country-wave-level predictors, though it's the
>>>>> latter that concerns me. Further, every non-binary
>>>>> indicator in the model is standardized by two standard
>>>>> deviations.
>>>>> 
>>>>> For those interested, I have a reproducible (if rather
>>>>> large) example below. Dropbox link to the data is here:
>>>>> 
>>>> 
>>> https://www.dropbox.com/s/t29jfwm98tsdr71/correlated-random-effects.csv?dl=0
>>>>>
>>>>>
>>> 
In this reproducible example, y is the outcome variable and x1 and x2
>>> are
>>>>> two country-wave-level predictors I allow to vary by
>>>>> region. Both x1
>>> and
>>>> x2
>>>>> are interval-level predictors that I standardized to have a
>>>>> mean of
>>> zero
>>>>> and a standard deviation of .5 (per Gelman's (2008)
>>>>> recommendation).
>>>>> 
>>>>> I estimate the following model.
>>>>> 
>>>>> summary(M1 <- glmer(y ~ x1 + x2 + (1 | country) + (1 |
>>>>> country:wave) +
>>>> (1 +
>>>>> x1 + x2 | region), data=subset(Data),
>>>>> family=binomial(link="logit")))
>>>>> 
>>>>> The results are theoretically intuitive. I think they make
>>>>> sense.
>>>> However,
>>>>> I get a report of perfect correlation for the varying
>>>>> slopes of the
>>>> region
>>>>> random effect.
>>>>> 
>>>>> Random effects: Groups Name Variance Std.Dev. Corr 
>>>>> country:wave (Intercept) 0.15915 0.3989 country (Intercept)
>>>>> 0.32945 0.5740 region (Intercept) 0.01646 0.1283 x1 0.02366
>>>>> 0.1538 1.00 x2 0.13994 0.3741 -1.00 -1.00 Number of obs:
>>>>> 212570, groups: country:wave, 143; country, 82; region,
>>> 6
>>>>> 
>>>>> What should I make of this and am I estimating this model
>>>>> wrong? For
>>> what
>>>>> it's worth, the dotplot of the region random effect (with
>>>>> conditional variance) makes sense and is theoretically
>>>>> intuitive, given my data. ( 
>>>>> http://i.imgur.com/mrnaJ77.png)
>>>>> 
>>>>> Any help would be greatly appreciated.
>>>>> 
>>>>> Best regards, Steve
>>>>> 
>>>>> [[alternative HTML version deleted]]
>>>>> 
>>>>> _______________________________________________ 
>>>>> R-sig-mixed-models at r-project.org mailing list 
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>> 
>>> 
>>> 
>>> 
>>> -- Steven V. Miller Assistant Professor Department of Political
>>> Science Clemson University http://svmiller.com
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________ 
>>> R-sig-mixed-models at r-project.org mailing list 
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> 
>> 
>> [[alternative HTML version deleted]]
>> 
>> _______________________________________________ 
>> R-sig-mixed-models at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJVps4fAAoJEOCV5YRblxUH0wQIANBg6CaKoHuM6RQY5VltpEbk
5+RYc0tIYXmvNzGesG0QTQaLz0A5cx5mo0EGxsKQq8vUz2ycRlSlcYo9uI0K/xft
D8MMhdVr8QhIW2RtoWPNPzn6HIe276CFnHg4Co+3vbMcccbvTvWvxsDYaT/LOlRn
JoVjN/HcOscMOQkAxZV6elYBZe+kbVVhOS0SNo3Bt5P528EuWIxaRlC2lO5aoHSL
1cgLn5uyWLsxb3Cuu3FctwYfYOk9hsEXNM/EGMleshDq6umGtSm9lqiM8vqgSnMl
Iyp2A+r3fkRzfEZyWv0Ygi4OA0iZ5/BSH44+sR60hj/qSpqGYwUQ+fIrfKXAYTw=
=cHT0
-----END PGP SIGNATURE-----