[R-sig-ME] Perfectly correlated random effects (when they shouldn't be)
Paul Buerkner
paul.buerkner at gmail.com
Sat Jul 18 23:38:41 CEST 2015
That looks better.
I used the developmental version of brms from github available through
library(devtools)
install_github("paul-buerkner/brms")
because the release version on CRAN does not yet support all relevant
features (the country:wave syntax is not accepted by brms 0.3.0).
The syntax i used was
library(brms)
fit1_brm <- brm(y ~ x1 + x2 + (1 | country) + (1 | country:wave) + (1 + x1
+ x2 | region),
data = data, family = "bernoulli", n.warmup =
100, n.iter = 500, n.chains = 1)
summary(fit1_brm)
plot(fit1_brm)
fit2_brm <- brm(y ~ x1 + x2 + (1 | country) + (1 | country:wave) + (1 |
region),
data = data, family = "bernoulli", n.warmup =
100, n.iter = 500, n.chains = 1)
summary(fit2_brm)
plot(fit2_brm)
Remember that you have to install Stan to make brms work. It will take
several hours to fit those models, so be patient. :-)
2015-07-18 22:29 GMT+02:00 svm <steven.v.miller at gmail.com>:
> Genuinely had no idea about that brms package in R. Is it new? It looks
> very intriguing, though the syntax I'm seeing in the reference manual
> doesn't appear to be working. Do you mind if I see the code? Maybe I'm
> doing it wrong.
>
> I followed Ben's advice and pursued a blmer strategy with a disaggregated
> version of the random effects variable. I ran the full models (so: not the
> simple reproducible example) and got something that makes more sense. I
> also corrected for various weird data points (i.e. countries whose surveys
> were likely botched in execution). Here is the summary of the random
> effects.
>
> Random effects:
> Groups Name Variance Std.Dev. Corr
> country:wave (Intercept) 0.16091 0.4011
> country (Intercept) 0.34949 0.5912
> region (Intercept) 0.04604 0.2146
> eti 0.24501 0.4950 0.20
> sti 0.18997 0.4359 -0.45 -0.04
> Number of obs: 208551, groups: country:wave, 143; country, 82; region, 13
>
> I think this makes sense. The regions ultimately don't vary that much from
> each other, which I think is intuitive, given the data. The correlations
> among the random slopes are importantly not 0, -1, or 1.
>
> In a full model (with controls), the significance of the z-value
> associated with x1 is 1.741 (coeff: .516, sd: .296), which is below the
> conventional cutoff of z = 1.96. My field is kind of a "cult of p < .05"
> crowd, but I'm not terribly bothered by this. I do believe what I'm doing
> here is a very conservative test of where x1 is discernible from zero
> across the range of the data.
>
> As for the comparison between x1 in fit2_brm versus the maximal model in
> lme4: I honestly don't know. I learn something new each time I do a mixed
> effects analysis, but I can offer no immediate answer for that.
>
> I definitely do appreciate the input of this listserv. I'm more than
> grateful for your time and patience with me.
>
> Best regards,
> Steve
>
> On Sat, Jul 18, 2015 at 2:40 PM, Paul Buerkner <paul.buerkner at gmail.com>
> wrote:
>
>> I managed to fit the models with bayesian techniques using the package
>> brms.
>>
>> Because your data is so huge, I decided to run only 500 iterations, of
>> which 100 are warmup (i.e. burnin). As brms uses Stan on the backend, the
>> quality of the samples is relatively high so that 400 markov samples
>> should
>> be enough for our purposes.
>>
>> Below I present the results of the maximal model as well as the model
>> withour any random effects for x1 and x2. The related plots are attached.
>>
>> First, the maximal model (fit1_brm):
>>
>> Family: bernoulli (logit)
>> Formula: y ~ x1 + x2 + (1 | country) + (1 | country:wave) + (1 + x1 +
>> x2 | region)
>> Data: data (Number of observations: 212570)
>> Samples: 1 chains, each with n.iter = 500; n.warmup = 100; n.thin = 1;
>> total post-warmup samples = 400
>>
>> Random Effects:
>> ~country (Number of levels: 82)
>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> sd(Intercept) 0.59 0.08 0.44 0.75 140 1
>>
>> ~country:wave (Number of levels: 143)
>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> sd(Intercept) 0.42 0.04 0.34 0.51 148 1
>>
>> ~region (Number of levels: 6)
>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> sd(Intercept) 0.24 0.20 0.01 0.77 300 1
>> sd(x1) 0.41 0.38 0.02 1.44 398 1
>> sd(x2) 0.57 0.45 0.03 1.75 289 1
>> cor(Intercept,x1) 0.04 0.47 -0.84 0.86 400 1
>> cor(Intercept,x2) -0.20 0.46 -0.90 0.70 400 1
>> cor(x1,x2) -0.14 0.50 -0.92 0.87 400 1
>>
>> Fixed Effects:
>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> Intercept -0.61 0.15 -0.91 -0.36 400 1
>> x1 0.43 0.39 -0.31 1.20 400 1
>> x2 0.02 0.40 -0.69 0.81 344 1
>>
>> Samples were drawn using NUTS(diag_e). For each parameter, Eff.Sample is a
>> crude measure of effective sample size, and Rhat is the potential scale
>> reduction factor on split chains (at convergence, Rhat = 1).
>>
>>
>>
>>
>> >From what I see, the results differ from lme4 in four aspects.
>>
>> 1. Indeed, there seems to be not enough evidence in the data to estimate
>> the random effects correlations reasonably, as implied by the huge CIs.
>> The
>> Estimates (means in this case) are close to 0 instead of -1 or 1, which
>> seems more intuitive from my perspective.
>> 2. While the random effects standard deviations of country and
>> country:wave
>> are similar to lme4 (their posterior distributions are nearly symmetric;
>> see plot_fit1_brm in the attachment), the sds of region are considerably
>> larger than the ones estimated by lme4. This is because the mode is always
>> smaller than the mean when the distributions a right skewed. Especially,
>> from my perspective, they do not seem to be so small that removing them
>> from the model would be justified (I am currently running the same model
>> without correlations but the results shouldnt be too different).
>> 3. The estimate of x1 is a bit smaller than in lme4 (0.43 against 0.49).
>> Unfortunately, I do not have any immediate explanation for this...
>> 4. The standard error of x1 has nearly doubled as compared to lme4 (0.39
>> against 0.2). As a result, the CI is almost twice a large. Most likely,
>> this is because the random effects sd of x1 is much larger than in lme4,
>> which also affects the standard error of the fixed effect.
>>
>> To test, whether 4. is solely because of the random effects of x1, I
>> completely removed them in the second model (fit2_brm):
>>
>> Family: bernoulli (logit)
>> Formula: y ~ x1 + x2 + (1 | country) + (1 | country:wave) + (1 | region)
>> Data: data (Number of observations: 212570)
>> Samples: 1 chains, each with n.iter = 500; n.warmup = 100; n.thin = 1;
>> total post-warmup samples = 400
>>
>> Random Effects:
>> ~country (Number of levels: 82)
>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> sd(Intercept) 0.61 0.08 0.49 0.77 96 1.03
>>
>> ~country:wave (Number of levels: 143)
>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> sd(Intercept) 0.4 0.04 0.34 0.49 197 1.01
>>
>> ~region (Number of levels: 6)
>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> sd(Intercept) 0.27 0.19 0.03 0.72 278 1
>>
>> Fixed Effects:
>> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
>> Intercept -0.61 0.15 -0.95 -0.31 400 1.01
>> x1 0.42 0.25 -0.05 0.89 400 1.00
>> x2 -0.04 0.22 -0.46 0.37 400 1.00
>>
>> Samples were drawn using NUTS(diag_e). For each parameter, Eff.Sample is a
>> crude measure of effective sample size, and Rhat is the potential scale
>> reduction factor on split chains (at convergence, Rhat = 1).
>>
>>
>> The standard error of x1 indeed went down being close to what lme4
>> estimated (0.25 against 0.24). What I find suspicius is that the maximal
>> model in lme4 (i.e. the one including random effects for x1) estimates
>> lower
>> standard error for the fixed effect of x1 than the model without those
>> random effects (see the results of Reinhold Klingl). From my perspective,
>> this is unlikely to be correct. Whats your opinon on this?
>>
>>
>> 2015-07-15 23:18 GMT+02:00 Ben Bolker <bbolker at gmail.com>:
>>
>> > -----BEGIN PGP SIGNED MESSAGE-----
>> > Hash: SHA1
>> >
>> > On 15-07-15 04:38 PM, Ben Bolker wrote:
>> > > On 15-07-15 12:52 PM, Paul Buerkner wrote:
>> > >> if you look at the results from a baysian perspective, it seems
>> > >> to be a typcial "problem" of ML-procedures estimating the mode.
>> > >>
>> > >> The mode is nothing special, just the point where the density is
>> > >> maximal. When you have skewed distribution (as usual for
>> > >> correlations) the mode will often be close to the borders of the
>> > >> region of definition (-1 or 1 in this case). The posterior
>> > >> distribution of the correlation, however, can still be very wide
>> > >> ranging from strong negative correlation to strong positive
>> > >> correlation, especially when the number of levels of a grouping
>> > >> factor is not that large. In those cases, zero (i.e.
>> > >> insignificant) correlation is a very likely value even if the
>> > >> mode itself is extreme.
>> > >>
>> > >> I tried fitting your models with bayesian R packages (brms and
>> > >> MCMCglmm). Unfortunately, because you have so many observations
>> > >> and quite a few random effects, they run relatively slow so i am
>> > >> still waiting for the results.
>> > >
>> >
>> > You can also use blme, which implements a very thin Bayesian wrapper
>> > around [g]lmer and does maximum _a posteriori_ (i.e. Bayesian mode)
>> > estimates with
>> > weak (but principled) priors on the random effects -- it's based on
>> >
>> > Chung, Yeojin, Sophia Rabe-Hesketh, Vincent Dorie, Andrew Gelman, and
>> > Jingchen Liu. “A Nondegenerate Penalized Likelihood Estimator for
>> > Variance Parameters in Multilevel Models.” Psychometrika 78, no. 4
>> > (March 12, 2013): 685–709. doi:10.1007/s11336-013-9328-2.
>> >
>> > Profile 'likelihood' confidence intervals based on blme will get you
>> > a reasonable approximation of the width of the credible interval,
>> > although it's a little bit of a cheesy/awkward combination between
>> > marginal (proper Bayesian) and conditional (MAP/cheesy-Bayesian)
>> > measures of uncertainty.
>> >
>> >
>> >
>> >
>> >
>> > >> 2015-07-15 3:45 GMT+02:00 svm <steven.v.miller at gmail.com>:
>> > >>
>> > >>> I considered that. I disaggregated the region random effect
>> > >>> from 6 to 18 (the latter of which approximates the World Bank's
>> > >>> region classification). I'm still encountering the same curious
>> > >>> issue.
>> > >>>
>> > >>> Random effects: Groups Name Variance Std.Dev.
>> > >>> Corr country:wave (Intercept) 0.1530052 0.39116 country
>> > >>> (Intercept) 0.3735876 0.61122 wbregion (Intercept)
>> > >>> 0.0137822 0.11740 x1 0.0009384 0.03063 -1.00 x2
>> > >>> 0.0767387 0.27702 -1.00 1.00 Number of obs: 212570, groups:
>> > >>> country:wave, 143; country, 82; wbregion, 18
>> > >>>
>> > >>> For what it's worth: the model estimates fine. The results are
>> > >>> intuitive and theoretically consistent. They also don't change
>> > >>> if I were to remove that region random effect. I'd like to keep
>> > >>> the region random effect (with varying slopes) in the model.
>> > >>> I'm struggling with what I should think about the perfect
>> > >>> correlations.
>> > >>>
>> > >>> On Tue, Jul 14, 2015 at 9:07 PM, Jake Westfall
>> > >>> <jake987722 at hotmail.com> wrote:
>> > >>>
>> > >>>> Hi Steve,
>> > >>>>
>> > >>>>
>> > >>>> I think the issue is that estimating 3 variances and 3
>> > >>>> covariances for regions is quite ambitious given that there
>> > >>>> are only 6 regions. I think it's not surprising that the
>> > >>>> model has a hard time getting good estimates of those
>> > >>>> parameters.
>> > >>>>
>> > >>>>
>> > >>>> Jake
>> > >>>>
>> > >>>>> Date: Tue, 14 Jul 2015 20:53:01 -0400 From:
>> > >>>>> steven.v.miller at gmail.com To:
>> > >>>>> r-sig-mixed-models at r-project.org Subject: [R-sig-ME]
>> > >>>>> Perfectly correlated random effects (when they
>> > >>>> shouldn't be)
>> > >>>>
>> > >>>>>
>> > >>>>> Hi all,
>> > >>>>>
>> > >>>>> I'm a long-time reader and wanted to raise a question I've
>> > >>>>> seen asked
>> > >>>> here
>> > >>>>> before about correlated random effects. Past answers I have
>> > >>>>> encountered
>> > >>>> on
>> > >>>>> this listserv explain that perfectly correlated random
>> > >>>>> effects suggest model overfitting and variances of random
>> > >>>>> effects that are effectively
>> > >>>> zero
>> > >>>>> and can be omitted for a simpler model. In my case, I don't
>> > >>>>> think
>> > >>> that's
>> > >>>>> what is happening here, though I could well be fitting a
>> > >>>>> poor model in glmer.
>> > >>>>>
>> > >>>>> I'll describe the nature of the data first. I'm modeling
>> > >>> individual-level
>> > >>>>> survey data for countries across multiple waves and am
>> > >>>>> estimating the region of the globe as a random effect as
>> > >>>>> well. I have three random
>> > >>>> effects
>> > >>>>> (country, country-wave, and region). In the region random
>> > >>>>> effect, I am allowing country-wave-level predictors to have
>> > >>>>> varying slopes. My
>> > >>> inquiry
>> > >>>>> is whether some country-wave-level contextual indicator can
>> > >>>>> have an
>> > >>>> overall
>> > >>>>> effect (as a fixed effect), the effect of which can vary by
>> > >>>>> region. In other words: is the effect of some country-level
>> > >>>>> indicator (e.g. unemployment) in a given year different for
>> > >>>>> countries in Western Europe than for countries in Africa
>> > >>>>> even if, on average, there is a positive
>> > >>> or
>> > >>>>> negative association at the individual-level? These
>> > >>>>> country-wave-level predictors that I allow to vary by
>> > >>>>> region are the ones reporting
>> > >>> perfect
>> > >>>>> correlation and I'm unsure how to interpret that (or if I'm
>> > >>>>> estimating
>> > >>>> the
>> > >>>>> model correctly).
>> > >>>>>
>> > >>>>> I should also add that I have individual-level predictors
>> > >>>>> as well as country-wave-level predictors, though it's the
>> > >>>>> latter that concerns me. Further, every non-binary
>> > >>>>> indicator in the model is standardized by two standard
>> > >>>>> deviations.
>> > >>>>>
>> > >>>>> For those interested, I have a reproducible (if rather
>> > >>>>> large) example below. Dropbox link to the data is here:
>> > >>>>>
>> > >>>>
>> > >>>
>> >
>> https://www.dropbox.com/s/t29jfwm98tsdr71/correlated-random-effects.csv?dl=0
>> > >>>>>
>> > >>>>>
>> > >>>
>> > In this reproducible example, y is the outcome variable and x1 and x2
>> > >>> are
>> > >>>>> two country-wave-level predictors I allow to vary by
>> > >>>>> region. Both x1
>> > >>> and
>> > >>>> x2
>> > >>>>> are interval-level predictors that I standardized to have a
>> > >>>>> mean of
>> > >>> zero
>> > >>>>> and a standard deviation of .5 (per Gelman's (2008)
>> > >>>>> recommendation).
>> > >>>>>
>> > >>>>> I estimate the following model.
>> > >>>>>
>> > >>>>> summary(M1 <- glmer(y ~ x1 + x2 + (1 | country) + (1 |
>> > >>>>> country:wave) +
>> > >>>> (1 +
>> > >>>>> x1 + x2 | region), data=subset(Data),
>> > >>>>> family=binomial(link="logit")))
>> > >>>>>
>> > >>>>> The results are theoretically intuitive. I think they make
>> > >>>>> sense.
>> > >>>> However,
>> > >>>>> I get a report of perfect correlation for the varying
>> > >>>>> slopes of the
>> > >>>> region
>> > >>>>> random effect.
>> > >>>>>
>> > >>>>> Random effects: Groups Name Variance Std.Dev. Corr
>> > >>>>> country:wave (Intercept) 0.15915 0.3989 country (Intercept)
>> > >>>>> 0.32945 0.5740 region (Intercept) 0.01646 0.1283 x1 0.02366
>> > >>>>> 0.1538 1.00 x2 0.13994 0.3741 -1.00 -1.00 Number of obs:
>> > >>>>> 212570, groups: country:wave, 143; country, 82; region,
>> > >>> 6
>> > >>>>>
>> > >>>>> What should I make of this and am I estimating this model
>> > >>>>> wrong? For
>> > >>> what
>> > >>>>> it's worth, the dotplot of the region random effect (with
>> > >>>>> conditional variance) makes sense and is theoretically
>> > >>>>> intuitive, given my data. (
>> > >>>>> http://i.imgur.com/mrnaJ77.png)
>> > >>>>>
>> > >>>>> Any help would be greatly appreciated.
>> > >>>>>
>> > >>>>> Best regards, Steve
>> > >>>>>
>> > >>>>> [[alternative HTML version deleted]]
>> > >>>>>
>> > >>>>> _______________________________________________
>> > >>>>> R-sig-mixed-models at r-project.org mailing list
>> > >>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> > >>>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> -- Steven V. Miller Assistant Professor Department of Political
>> > >>> Science Clemson University http://svmiller.com
>> > >>>
>> > >>> [[alternative HTML version deleted]]
>> > >>>
>> > >>> _______________________________________________
>> > >>> R-sig-mixed-models at r-project.org mailing list
>> > >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> > >>>
>> > >>
>> > >> [[alternative HTML version deleted]]
>> > >>
>> > >> _______________________________________________
>> > >> R-sig-mixed-models at r-project.org mailing list
>> > >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> > >>
>> > >
>> >
>> > -----BEGIN PGP SIGNATURE-----
>> > Version: GnuPG v1.4.11 (GNU/Linux)
>> >
>> > iQEcBAEBAgAGBQJVps4fAAoJEOCV5YRblxUH0wQIANBg6CaKoHuM6RQY5VltpEbk
>> > 5+RYc0tIYXmvNzGesG0QTQaLz0A5cx5mo0EGxsKQq8vUz2ycRlSlcYo9uI0K/xft
>> > D8MMhdVr8QhIW2RtoWPNPzn6HIe276CFnHg4Co+3vbMcccbvTvWvxsDYaT/LOlRn
>> > JoVjN/HcOscMOQkAxZV6elYBZe+kbVVhOS0SNo3Bt5P528EuWIxaRlC2lO5aoHSL
>> > 1cgLn5uyWLsxb3Cuu3FctwYfYOk9hsEXNM/EGMleshDq6umGtSm9lqiM8vqgSnMl
>> > Iyp2A+r3fkRzfEZyWv0Ygi4OA0iZ5/BSH44+sR60hj/qSpqGYwUQ+fIrfKXAYTw=
>> > =cHT0
>> > -----END PGP SIGNATURE-----
>> >
>> > _______________________________________________
>> > R-sig-mixed-models at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list