[R-sig-ME] Perfectly correlated random effects (when they shouldn't be)

Wed Jul 15 10:57:29 CEST 2015

It looks like your model is overparameterized. Specifically, although you
have a very large number of observations (N=212570),  the data do not
support the six model parameters you try to estimate for the random-effects
term of region. You can see this quickly with the rePCA() function. There
was also a convergence warning for me, but perhaps this was due to the fact
that I worked with the full set (as provided in the dropbox file), whereas
you used a subset.

As a first step, I forced the correlation parameters to zero with the
||-syntax (zcpM1 GLMM below). A LRT suggests that dropping the three
correlation parameters does not decrease the goodness of fit; Chi-sq(3) =
1.9, p=.5959.  This could very well be a consequence of the small number of
regions, as Jake pointed out; and even 18 regions may not be enough for a
reliable estimation of three correlation parameters.

Moreover, the rePCA() suggests that even the three remaining model
parameters for the random-effects term of region are quite marginally
supported by the data. In the zero-correlation parameter GLMM, the variance
component for x1 was estimated to be suspiciously small.  Therefore, in a
second step, I removed the region-related x1 variance component . Again,
according to a LRT there was no significant loss in goodness of fit;
Chi-sq(1) = 0.

The R script is inserted below. Please note that taking a correlation
parameter or a variance component out of a model does not mean that it is
zero, but that the data do not support a model that assumes that it is
different from zero.  zcpM2 (below) looks like an adequate parsimonious
GLMM for these data.

###

# Note: The original post used a subset of the data, but it is not clear
how they were selected. I used all the data.

library(lme4)
library(RePsychLing)
data <- read.csv("correlated-random-effects.csv", header=TRUE)
str(data)
data$country <- factor(data$country)
data$region <- factor(data$region)

# no convergence problem
print(summary(M1 <- glmer(y ~ x1 + x2 + (1 | country) + (1 | country:wave) +
                          (1 +  x1 + x2 | region), data=data,
family=binomial(link="logit"))))

summary(rePCA(M1))

# convergence problem
print(summary(zcpM1 <- glmer(y ~ x1 + x2 + (1 | country) + (1 |
country:wave) +
                             (1 +  x1 + x2 || region), data=data,
family=binomial(link="logit"))))
summary(rePCA(zcpM1))

# ok
print(summary(zcpM2 <- glmer(y ~ x1 + x2 + (1 | country) + (1 |
country:wave) +
                                   (1 + x2 || region), data=data,
family=binomial(link="logit"))))
summary(rePCA(zcpM2))

anova(zcpM2, zcpM1, M1)

####

On Wed, Jul 15, 2015 at 3:45 AM, svm <steven.v.miller at gmail.com> wrote:

> I considered that. I disaggregated the region random effect from 6 to 18
> (the latter of which approximates the World Bank's region classification).
> I'm still encountering the same curious issue.
>
> Random effects:
>  Groups       Name        Variance  Std.Dev. Corr
>  country:wave (Intercept) 0.1530052 0.39116
>  country      (Intercept) 0.3735876 0.61122
>  wbregion     (Intercept) 0.0137822 0.11740
>               x1        0.0009384 0.03063  -1.00
>               x2         0.0767387 0.27702  -1.00  1.00
> Number of obs: 212570, groups:  country:wave, 143; country, 82; wbregion,
> 18
>
>  For what it's worth: the model estimates fine. The results are intuitive
> and theoretically consistent. They also don't change if I were to remove
> that region random effect. I'd like to keep the region random effect (with
> varying slopes) in the model. I'm struggling with what I should think about
> the perfect correlations.
>
> On Tue, Jul 14, 2015 at 9:07 PM, Jake Westfall <jake987722 at hotmail.com>
> wrote:
>
> > Hi Steve,
> >
> >
> > I think the issue is that estimating 3 variances and 3 covariances for
> > regions is quite ambitious given that there are only 6 regions. I think
> > it's not surprising that the model has a hard time getting good estimates
> > of those parameters.
> >
> >
> > Jake
> >
> > > Date: Tue, 14 Jul 2015 20:53:01 -0400
> > > From: steven.v.miller at gmail.com
> > > To: r-sig-mixed-models at r-project.org
> > > Subject: [R-sig-ME] Perfectly correlated random effects (when they
> > shouldn't be)
> >
> > >
> > > Hi all,
> > >
> > > I'm a long-time reader and wanted to raise a question I've seen asked
> > here
> > > before about correlated random effects. Past answers I have encountered
> > on
> > > this listserv explain that perfectly correlated random effects suggest
> > > model overfitting and variances of random effects that are effectively
> > zero
> > > and can be omitted for a simpler model. In my case, I don't think
> that's
> > > what is happening here, though I could well be fitting a poor model in
> > > glmer.
> > >
> > > I'll describe the nature of the data first. I'm modeling
> individual-level
> > > survey data for countries across multiple waves and am estimating the
> > > region of the globe as a random effect as well. I have three random
> > effects
> > > (country, country-wave, and region). In the region random effect, I am
> > > allowing country-wave-level predictors to have varying slopes. My
> inquiry
> > > is whether some country-wave-level contextual indicator can have an
> > overall
> > > effect (as a fixed effect), the effect of which can vary by region. In
> > > other words: is the effect of some country-level indicator (e.g.
> > > unemployment) in a given year different for countries in Western Europe
> > > than for countries in Africa even if, on average, there is a positive
> or
> > > negative association at the individual-level? These country-wave-level
> > > predictors that I allow to vary by region are the ones reporting
> perfect
> > > correlation and I'm unsure how to interpret that (or if I'm estimating
> > the
> > > model correctly).
> > >
> > > I should also add that I have individual-level predictors as well as
> > > country-wave-level predictors, though it's the latter that concerns me.
> > > Further, every non-binary indicator in the model is standardized by two
> > > standard deviations.
> > >
> > > For those interested, I have a reproducible (if rather large) example
> > > below. Dropbox link to the data is here:
> > >
> >
> https://www.dropbox.com/s/t29jfwm98tsdr71/correlated-random-effects.csv?dl=0
> > >
> > > In this reproducible example, y is the outcome variable and x1 and x2
> are
> > > two country-wave-level predictors I allow to vary by region. Both x1
> and
> > x2
> > > are interval-level predictors that I standardized to have a mean of
> zero
> > > and a standard deviation of .5 (per Gelman's (2008) recommendation).
> > >
> > > I estimate the following model.
> > >
> > > summary(M1 <- glmer(y ~ x1 + x2 + (1 | country) + (1 | country:wave) +
> > (1 +
> > > x1 + x2 | region), data=subset(Data), family=binomial(link="logit")))
> > >
> > > The results are theoretically intuitive. I think they make sense.
> > However,
> > > I get a report of perfect correlation for the varying slopes of the
> > region
> > > random effect.
> > >
> > > Random effects:
> > > Groups Name Variance Std.Dev. Corr
> > > country:wave (Intercept) 0.15915 0.3989
> > > country (Intercept) 0.32945 0.5740
> > > region (Intercept) 0.01646 0.1283
> > > x1 0.02366 0.1538 1.00
> > > x2 0.13994 0.3741 -1.00 -1.00
> > > Number of obs: 212570, groups: country:wave, 143; country, 82; region,
> 6
> > >
> > > What should I make of this and am I estimating this model wrong? For
> what
> > > it's worth, the dotplot of the region random effect (with conditional
> > > variance) makes sense and is theoretically intuitive, given my data. (
> > > http://i.imgur.com/mrnaJ77.png)
> > >
> > > Any help would be greatly appreciated.
> > >
> > > Best regards,
> > > Steve
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > R-sig-mixed-models at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
>
>
> --
> Steven V. Miller
> Assistant Professor
> Department of Political Science
> Clemson University
> http://svmiller.com
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]