[R-sig-ME] Question regarding the estimation of the correlation between random intercepts and random slopes in the lme4 package

Wed Jul 16 15:57:08 CEST 2014

On 14-07-16 08:37 AM, Inga Schwabe wrote:
> Dear list members,
> 
> I posted a question regarding the lme4 package in R on stackoverflow.com
> (see
> http://stackoverflow.com/questions/24778642/estimating-correlation-between-random-slopes-and-random-intercepts-using-the-lme)
> and a stackoverflow-user kindly  referred me to this mailing list. I hope
> that my question is appropriate for this mailing list. If it is not, please
> let me know!
> 
> This is my question:
> 
> For answering my research question I am interested in the correlation
> between the random slopes and random intercepts in a multilevel model,
> estimated using the R library lme4.
> 
> The data I have is: Y (test-scores from students), SES (socio-economic
> status for each student) and schoolid (ID for each school).
> 
> I am using the following syntax to estimate random intercepts and slopes
> for the schools:
> 
> library(lme4)
> model3 <- lmer(Y ~ SES + (1 + SES | schoolid))
> 
>  The reference I used for this syntax is this pdf:
> 
> http://www.bristol.ac.uk/cmm/learning/module-samples/5-concepts-sample.pdf
> 
> On page 19, a similar analysis is described. It is said that by defining
> the random intercepts and slopes toghether, it is indirectly specified that
> we want the random intercepts and slopes to covary. Therefore, also the
> correlation between random slopes and random intercepts is estimated.
> Basically, exactly what I need for answering my research hyptohesis.
> 
> However, when I look at the results:
> 
>  summary(model3)
> 
>  I am getting the following output:
> 
> Linear mixed model fit by REML ['lmerMod']
> Formula: Y ~ SES + (1 + SES | schoolid)
> 
> REML criterion at convergence: 8256.4
> 
> Scaled residuals:
>     Min      1Q  Median      3Q     Max -3.1054 -0.6633 -0.0028  0.6810  3.5606
> 
> Random effects:
>  Groups   Name        Variance  Std.Dev. Corr
>  schoolid (Intercept) 0.6427924 0.80174
>       SES             0.0009143 0.03024  1.00
>  Residual             0.3290902 0.57366
> Number of obs: 4376, groups: schoolid, 179
> 
> Fixed effects:
>              Estimate Std. Error t value(Intercept) -0.036532   0.060582  -0.603
> SES          0.062491   0.009984   6.259
> 
> Correlation of Fixed Effects:
>     (Intr)
> SES 0.226
> 
>  As stated in the output, the correlation between the random slopes and
> random intercepts equals 1.00. I find this hard to believe. When I call in
> R:
> 
> VarCorr(model3)$schoolid
> 
>  I am getting the following output which gives the correlations and
> covariance matrix:
> 
>                 (Intercept)          SES(Intercept)  0.64279243 0.0242429680
> SES          0.02424297 0.0009143255
> 
> attr(,"stddev")(Intercept)         SES
>  0.80174337  0.03023782
> 
> attr(,"correlation")
>         (Intercept) SES(Intercept)           1   1
> SES                   1   1
> 
>  It seems as if the correlation between the slopes and intercepts was set
> to 1.00 by R. I did not see this in the output of anyone else when I was
> searching the internet on references on multilevel modelling.
> 
> Does anybody know what can be the cause of this correlation? Can it be that
> the correlation is set to 1.00 because otherwise the model would not be
> identified? Or is it because the variance of the random slopes is so small
> (0.0009) that the correlation can not be estimated?
> 
> I have tried to simulate data in order provide the code for a small
> reproducible dataset. I was however not yet able to reproduce this output
> by means of simulated data. As far as I have the code I will eidt my post
> and add the code.
> 
> 
> Many thanks for you time and effort,
> Kind regards,
> 
> Inga
> 

  I was thinking about answering on StackOverflow, but since you've
asked here: yes, your second-to-last paragraph is exactly correct.  You
don't have enough information (or equivalently there is too much noise)
in your data to uniquely identify the full variance-covariance matrix,
so the results are 'singular'; that is, one of the underlying components
is identified as zero.  Common symptoms of this situation include either
variances equal to zero (or nearly equal, although I see in your case
that the variances while small are *not* exactly zero) or correlations
equal to +/- 1.

http://rpubs.com/bbolker/4187 shows some simulated examples where the
estimated variance collapses to zero (despite the true, simulated model
having a non-zero among-group variance).  Presumably one could make up
similar examples (with randomized-block designs/random-slope models)
that would show analogous situations of correlations collapsing to +/- 1.

 It's a little bit surprising that you have encountered this problem
since it is most commonly a problem with small numbers of levels in the
grouping variables, which doesn't appear to be the case here.

  See http://glmm.wikidot.com/faq#singular_fits for more information ...

  Ben Bolker