[R-sig-ME] Perfectly correlated random effects (when they shouldn't be)

Wed Jul 15 19:44:25 CEST 2015

I appreciate you running the models in Bayesian packages. I've thought
about going that route, though I have no experience fitting these models in
Stan or JAGS or some other package.

As for the slow compile: I took some advice from the vignette on Cran about
performance optimization (even at the expense of suboptimal estimates). I
wish I knew how to parallelize the model, though that's a different topic.

On Wed, Jul 15, 2015 at 12:52 PM, Paul Buerkner <paul.buerkner at gmail.com>
wrote:

> if you look at the results from a baysian perspective, it seems to be a
> typcial "problem" of ML-procedures estimating the mode.
>
> The mode is nothing special, just the point where the density is maximal.
> When you have skewed distribution (as usual for correlations) the mode will
> often be close to the borders of the region of definition (-1 or 1 in this
> case). The posterior distribution of the correlation, however, can still be
> very wide ranging from strong negative correlation to strong positive
> correlation, especially when the number of levels of a grouping factor is
> not that large. In those cases, zero (i.e. insignificant) correlation is a
> very likely value even if the mode itself is extreme.
>
> I tried fitting your models with bayesian R packages (brms and MCMCglmm).
> Unfortunately, because you have so many observations and quite a few random
> effects, they run relatively slow so i am still waiting for the results.
>
> 2015-07-15 3:45 GMT+02:00 svm <steven.v.miller at gmail.com>:
>
>> I considered that. I disaggregated the region random effect from 6 to 18
>> (the latter of which approximates the World Bank's region classification).
>> I'm still encountering the same curious issue.
>>
>> Random effects:
>>  Groups       Name        Variance  Std.Dev. Corr
>>  country:wave (Intercept) 0.1530052 0.39116
>>  country      (Intercept) 0.3735876 0.61122
>>  wbregion     (Intercept) 0.0137822 0.11740
>>               x1        0.0009384 0.03063  -1.00
>>               x2         0.0767387 0.27702  -1.00  1.00
>> Number of obs: 212570, groups:  country:wave, 143; country, 82; wbregion,
>> 18
>>
>>  For what it's worth: the model estimates fine. The results are intuitive
>> and theoretically consistent. They also don't change if I were to remove
>> that region random effect. I'd like to keep the region random effect (with
>> varying slopes) in the model. I'm struggling with what I should think
>> about
>> the perfect correlations.
>>
>> On Tue, Jul 14, 2015 at 9:07 PM, Jake Westfall <jake987722 at hotmail.com>
>> wrote:
>>
>> > Hi Steve,
>> >
>> >
>> > I think the issue is that estimating 3 variances and 3 covariances for
>> > regions is quite ambitious given that there are only 6 regions. I think
>> > it's not surprising that the model has a hard time getting good
>> estimates
>> > of those parameters.
>> >
>> >
>> > Jake
>> >
>> > > Date: Tue, 14 Jul 2015 20:53:01 -0400
>> > > From: steven.v.miller at gmail.com
>> > > To: r-sig-mixed-models at r-project.org
>> > > Subject: [R-sig-ME] Perfectly correlated random effects (when they
>> > shouldn't be)
>> >
>> > >
>> > > Hi all,
>> > >
>> > > I'm a long-time reader and wanted to raise a question I've seen asked
>> > here
>> > > before about correlated random effects. Past answers I have
>> encountered
>> > on
>> > > this listserv explain that perfectly correlated random effects suggest
>> > > model overfitting and variances of random effects that are effectively
>> > zero
>> > > and can be omitted for a simpler model. In my case, I don't think
>> that's
>> > > what is happening here, though I could well be fitting a poor model in
>> > > glmer.
>> > >
>> > > I'll describe the nature of the data first. I'm modeling
>> individual-level
>> > > survey data for countries across multiple waves and am estimating the
>> > > region of the globe as a random effect as well. I have three random
>> > effects
>> > > (country, country-wave, and region). In the region random effect, I am
>> > > allowing country-wave-level predictors to have varying slopes. My
>> inquiry
>> > > is whether some country-wave-level contextual indicator can have an
>> > overall
>> > > effect (as a fixed effect), the effect of which can vary by region. In
>> > > other words: is the effect of some country-level indicator (e.g.
>> > > unemployment) in a given year different for countries in Western
>> Europe
>> > > than for countries in Africa even if, on average, there is a positive
>> or
>> > > negative association at the individual-level? These country-wave-level
>> > > predictors that I allow to vary by region are the ones reporting
>> perfect
>> > > correlation and I'm unsure how to interpret that (or if I'm estimating
>> > the
>> > > model correctly).
>> > >
>> > > I should also add that I have individual-level predictors as well as
>> > > country-wave-level predictors, though it's the latter that concerns
>> me.
>> > > Further, every non-binary indicator in the model is standardized by
>> two
>> > > standard deviations.
>> > >
>> > > For those interested, I have a reproducible (if rather large) example
>> > > below. Dropbox link to the data is here:
>> > >
>> >
>> https://www.dropbox.com/s/t29jfwm98tsdr71/correlated-random-effects.csv?dl=0
>> > >
>> > > In this reproducible example, y is the outcome variable and x1 and x2
>> are
>> > > two country-wave-level predictors I allow to vary by region. Both x1
>> and
>> > x2
>> > > are interval-level predictors that I standardized to have a mean of
>> zero
>> > > and a standard deviation of .5 (per Gelman's (2008) recommendation).
>> > >
>> > > I estimate the following model.
>> > >
>> > > summary(M1 <- glmer(y ~ x1 + x2 + (1 | country) + (1 | country:wave) +
>> > (1 +
>> > > x1 + x2 | region), data=subset(Data), family=binomial(link="logit")))
>> > >
>> > > The results are theoretically intuitive. I think they make sense.
>> > However,
>> > > I get a report of perfect correlation for the varying slopes of the
>> > region
>> > > random effect.
>> > >
>> > > Random effects:
>> > > Groups Name Variance Std.Dev. Corr
>> > > country:wave (Intercept) 0.15915 0.3989
>> > > country (Intercept) 0.32945 0.5740
>> > > region (Intercept) 0.01646 0.1283
>> > > x1 0.02366 0.1538 1.00
>> > > x2 0.13994 0.3741 -1.00 -1.00
>> > > Number of obs: 212570, groups: country:wave, 143; country, 82;
>> region, 6
>> > >
>> > > What should I make of this and am I estimating this model wrong? For
>> what
>> > > it's worth, the dotplot of the region random effect (with conditional
>> > > variance) makes sense and is theoretically intuitive, given my data. (
>> > > http://i.imgur.com/mrnaJ77.png)
>> > >
>> > > Any help would be greatly appreciated.
>> > >
>> > > Best regards,
>> > > Steve
>> > >
>> > > [[alternative HTML version deleted]]
>> > >
>> > > _______________________________________________
>> > > R-sig-mixed-models at r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >
>>
>>
>>
>> --
>> Steven V. Miller
>> Assistant Professor
>> Department of Political Science
>> Clemson University
>> http://svmiller.com
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>

	[[alternative HTML version deleted]]