[R-sig-ME] High correlation among random effects for longitudinal model

Sat Apr 7 17:31:13 CEST 2018

John et al., please ignore my question, I was a) confusing two things (I
wasn't using the predict() method, but rather ranef()) and b) the
correlations are high but not as high as I thought (closer to .95 than 1) -
due to an error in how I was calculating the individual-specific
predictions using ranef().

Thanks again.
Josh

On Fri, Apr 6, 2018 at 4:53 PM Joshua Rosenberg <jrosen at msu.edu> wrote:

> Dear John,
>
> Thank you so much--will use poly in the future (even in cases in which I
> might *not* use orthogonal polynomials), as it appears that the summary
> function returns helpful output (i.e., correlations fo the fixed effects),
> in addition to its other benefits.
>
> When I use the poly function, the random effects correlations are lower:
>
> Random effects:
>  Formula: ~+poly(wave, 2) | student_ID
>  Structure: General positive-definite, Log-Cholesky parametrization
>                StdDev     Corr
> (Intercept)      2.178734 (Intr) p(,2)1
> poly(wave, 2)1 279.521839 0.834
> poly(wave, 2)2 281.979199 0.751  0.987
> Residual        16.292629
>
> As are the fixed effects:
>
> Fixed effects: stwm ~ +poly(wave, 2)
>                    Value Std.Error   DF   t-value p-value
> (Intercept)      5.30155  0.231096 7070 22.940957       0
> poly(wave, 2)1 196.33516 23.481935 7070  8.361115       0
> poly(wave, 2)2 113.69005 23.591988 7070  4.819011       0
>  Correlation:
>                (Intr) p(,2)1
> poly(wave, 2)1 0.344
> poly(wave, 2)2 0.311  0.516
>
> However, when I calculate individual (i.e., group)-specific predicted
> values (i.e., BLUPs, using the predict() method, with level = 1), they are
> (very) highly correlated:
>
> # A tibble: 3 x 4
>   rowname   intercept linear quadratic
>   <chr>         <dbl>  <dbl>     <dbl>
> 1 intercept    NA      0.960     0.960
> 2 linear        0.960 NA         1.00
> 3 quadratic     0.960  1.00     NA
>
> When I calculate the same individual-specific predicted values using the
> non-orthogonal (raw) polynomials, these correlations are very nearly as
> high. At this point, I'm curious how / why these predictions are so highly
> correlated.
>
> Thank you again for your help, John, and for yours, Ben, Stuart, Jorg, and
> Thierry. Sorry if this is a beginner or otherwise ignorant question as I
> learn to work with these longitudinal / polynomial models.
> josh
>
> On Tue, Apr 3, 2018 at 1:32 PM, Fox, John <jfox at mcmaster.ca> wrote:
>
>> Dear Joshua,
>>
>> I'm chiming in late, so it's possible that someone already pointed this
>> out and I didn't notice. A better way to specify a polynomial in R is to
>> use poly() in the model formula. By default, this produces orthogonal
>> polynomial regressors (at least in the fixed effects) but the same fit to
>> the data. For example,
>>
>> > time <- 1:5
>> > X <- poly(time, 2)
>>
>> > X
>>               1          2
>> [1,] -0.6324555  0.5345225
>> [2,] -0.3162278 -0.2672612
>> [3,]  0.0000000 -0.5345225
>> [4,]  0.3162278 -0.2672612
>> [5,]  0.6324555  0.5345225
>> attr(,"coefs")
>> attr(,"coefs")$alpha
>> [1] 3 3
>>
>> attr(,"coefs")$norm2
>> [1]  1  5 10 14
>>
>> attr(,"degree")
>> [1] 1 2
>> attr(,"class")
>> [1] "poly"   "matrix"
>>
>> > colSums(X)
>>            1            2
>> 0.000000e+00 1.110223e-16
>>
>> > crossprod(X)
>>               1             2
>> 1  1.000000e+00 -1.110223e-16
>> 2 -1.110223e-16  1.000000e+00
>>
>> My guess is that this will also reduce the correlations among the random
>> effects. If you really must have raw polynomials, then poly(time, 2,
>> raw=TRUE) offers the advantage that model-structure-aware functions can
>> understand that the linear and quadratic regressors are part of the same
>> term in the model.
>>
>> Whether high correlations among the random effects are really a problem,
>> my guess is that they aren't, because lme() uses a log-Cholesky
>> factorization of the random-effects covariance matrix anyway. In some
>> contexts, high correlations might produce numerical instability, but, as I
>> said, probably not here. Ben would know.
>>
>> I hope this helps,
>>  John
>>
>> -----------------------------
>> John Fox, Professor Emeritus
>> McMaster University
>> Hamilton, Ontario, Canada
>> Web: socialsciences.mcmaster.ca/jfox/
>>
>>
>>
>>
>>
>> > -----Original Message-----
>> > From: R-sig-mixed-models [mailto:
>> r-sig-mixed-models-bounces at r-project.org]
>> > On Behalf Of Joshua Rosenberg
>> > Sent: Monday, April 2, 2018 5:33 PM
>> > To: Ben Bolker <bbolker at gmail.com>
>> > Cc: R SIG Mixed Models <r-sig-mixed-models at r-project.org>
>> > Subject: Re: [R-sig-ME] High correlation among random effects for
>> longitudinal
>> > model
>> >
>> > Dear Stuart and Ben,
>> >
>> > Thank you, this worked to significantly reduce the correlations between
>> the
>> > intercept and the linear and quadratic terms (though still quite high
>> between the
>> > linear and quadratic term):
>> >
>> > Random effects:
>> >  Formula: ~time + I(time^2) | student_ID
>> >  Structure: General positive-definite, Log-Cholesky parametrization
>> >             StdDev    Corr
>> > (Intercept) 18.671959 (Intr) time
>> > time        11.029842 -0.262
>> > I(time^2)    8.359834 -0.506  0.959
>> > Residual    29.006598
>> >
>> > Could I ask if that correlation between the linear (time) and quadratic
>> > I(time^2) terms is cause for concern - and if so, how to think about
>> > (potentially) addressing this?
>> > Josh
>> >
>> > On Sun, Apr 1, 2018 at 12:34 PM Ben Bolker <bbolker at gmail.com> wrote:
>> >
>> > > On Sun, Apr 1, 2018 at 12:20 PM, Stuart Luppescu <lupp at uchicago.edu>
>> > > wrote:
>> > > > On Sun, 2018-04-01 at 12:55 +0000, Joshua Rosenberg wrote:
>> > > >> lme(outcome ~ time + I(time^2),
>> > > >>     random = ~ time + I(time^2),
>> > > >>     correlation = corAR1(form = ~ time | individual_ID),
>> > > >>     data = d_grouped)
>> > > >>
>> > > >> I have a question / concerns about the random effects, as they are
>> > > >> highly correlated (intercept and linear term = -.95; intercept and
>> > > >> quadratic term = .96; linear term and quadratic term = -.995):
>> > > >
>> > > > I think this is an ordinary occurrence for the intercept and time
>> > > > trend to be negatively correlated. The way to avoid this is to
>> > > > center the time variable at a point in the middle of the series, so,
>> > > > instead of setting the values of time to {0, 1, 2, 3, 4} use {-2,
>> -1, 0, 1, 2}.
>> > > >
>> > >
>> > >   Agreed.  This is closely related, but not identical to, the
>> > > phenomenon where the *fixed effects* are highly correlated.
>> > >
>> > > > --
>> > > > Stuart Luppescu
>> > > > Chief Psychometrician (ret.)
>> > > > UChicago Consortium on School Research
>> > > > http://consortium.uchicago.edu
>> > > >
>> > > > _______________________________________________
>> > > > R-sig-mixed-models at r-project.org mailing list
>> > > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> > >
>> > > _______________________________________________
>> > > R-sig-mixed-models at r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> > >
>> > --
>> > Joshua Rosenberg, Ph.D. Candidate
>> > Educational Psychology & Educational Technology Michigan State
>> University
>> > http://jmichaelrosenberg.com
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > R-sig-mixed-models at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
>
> --
> Joshua Rosenberg, Ph.D. Candidate
> Educational Psychology & Educational Technology
> Michigan State University
> http://jmichaelrosenberg.com
>
-- 
Joshua Rosenberg, Ph.D. Candidate
Educational Psychology & Educational Technology
Michigan State University
http://jmichaelrosenberg.com

	[[alternative HTML version deleted]]