[R-sig-ME] categorical random effects correlation in lme4

Thu May 23 09:21:46 CEST 2013

Hmm, sorry, I'm not sure in that case. Others on this list may be able
to give you better advice, but my guess is that you don't have enough
information in the data (as you originally suggested). Do you have a
mix of white and nonwhite clients for each therapist? Are the numbers
of clients per therapist very unbalanced?

- Malcolm

On 22 May 2013 19:22, Andrew McAleavey <aam239 at psu.edu> wrote:
> Hi Malcolm & Henrik & all,
>
> Thanks for the help! I've tried some of these before, but tried even more
> now. I think I'm unfortunately still stuck on this, and I think a part of
> the reason is that dropping the random intercepts seems weird but I think it
> is necessary and helpful with a categorical fixed/random interaction.
>
> I know that removing the random intercept seems weird, but my understanding
> is that eliminating the intercept when modeling a categorical random slope
> helps interpretation, but doesn't change the model substantively. The
> variance of the random slope parameter for "white" changes and the
> correlation flips to -1.000 when you add in the intercept, but the model is
> actually equivalent. That is, without the intercept in the model, the random
> white=0 effect actually is the random intercept for therapists. See model
> fit below.
>
> As I understand it, by removing the intercept when using a categorical
> random/fixed interaction effect, you get a more interpretable value, namely
> variance in the differences between the reference group and the target group
> attributable to therapists. I believe that the variance shown when the
> intercept is included is too large, since it actually describes all variance
> associated with the categorical variable as deviations from an "average"
> client who doesn't exist.
>
> When estimating the model suggested by Henrik [(1|primary_ther) +
> (0+white|primary_ther)], two things seem odd to me. First, the intercept is
> forced to be uncorrelated with the slopes - this is not theoretically
> necessary. The other is that there are three random effects being estimated
> now, which is too many (with a two level categorical effect, only one
> parameter is necessary). Unless I'm wrong about something else, I don't
> think I can treat this the way I would a continuous variable. I think this
> is redundant and See below again.
>
> So I suppose I'm still stuck with the original question, since I don't think
> these solutions hold with a categorical effect. Am I just missing something
> else, or maybe the suggestions ought to hold for categorical effects and
> it's something else causing difficulties?
>
> Thanks a lot,
> Andrew
>
> Here's the model adding back in the random intercepts but including the
> white random effect:
>> print(fm1_ml_int <- lmer(DI ~ first_di + factor(white) +
>> (factor(white)|primary_ther), rem3post, REML=F), corr=F)
> Linear mixed model fit by maximum likelihood
> Formula: DI ~ first_di + factor(white) + (factor(white) | primary_ther)
>    Data: rem3post
>   AIC  BIC logLik deviance REMLdev
>  4980 5020  -2483     4966    4983
> Random effects:
>  Groups       Name           Variance Std.Dev. Corr
>  primary_ther (Intercept)    0.041708 0.20423
>               factor(white)1     0.018995 0.13782  -1.000
>  Residual                         0.509959 0.71411
> Number of obs: 2263, groups: primary_ther, 192
>
> Fixed effects:
>                Estimate Std. Error t value
> (Intercept)       0.45138    0.06007   7.514
> first_di            0.48009    0.02360  20.338
> factor(white)1  0.01140    0.03298   0.346
>
> Here's the difference test of interest, they appear to be equivalent models:
>> anova(fm1_ml, fm1_ml_int)
> Data: rem3post
> Models:
> fm1_ml: DI ~ first_di + factor(white) + (0 + factor(white) | primary_ther)
> fm1_ml_int: DI ~ first_di + factor(white) + (factor(white) | primary_ther)
>                  Df    AIC    BIC  logLik Chisq Chi Df Pr(>Chisq)
> fm1_ml       7 4979.9 5019.9 -2482.9
> fm1_ml_int  7 4979.9 5019.9 -2482.9     0      0          1
>
> And here's a version of the model with uncorrelated intercept and slopes
> (the bigger model has a singular convergence, as anticipated in these
> slides:http://lme4.r-forge.r-project.org/slides/2011-03-16-Amsterdam/2Longitudinal.pdf):
> Formula: DI ~ 1 + (1 | primary_ther) + (0 + factor(white) | primary_ther)
>    Data: rem3post_2
>   AIC  BIC logLik deviance REMLdev
>  5027 5062  -2508     5015    5022
> Random effects:
>  Groups       Name           Variance          Std.Dev.     Corr
>  primary_ther (Intercept)       1.7240e-06  0.001313
>  primary_ther factor(white)0  4.6543e-02  0.215738
>                     factor(white)1  6.9851e-03  0.083577   0.752
>  Residual                            5.1870e-01  0.720212
> Number of obs: 2263, groups: primary_ther, 192
>
> Fixed effects:
>             Estimate Std. Error t value
> (Intercept)  1.48466    0.01791   82.91
>
>
> On Tue, May 21, 2013 at 1:30 PM, Malcolm Fairbrother
> <M.Fairbrother at bristol.ac.uk> wrote:
>>
>> Dear Andrew,
>>
>> What if you drop the "0 +" bit? So:
>>
>>  lmer(DI ~ first_di + factor(white) + (factor(white) | primary_ther),
>> rem3post, REML=F)
>>
>> or
>>
>>  lmer(DI ~ first_di + white + (white | primary_ther), rem3post, REML=F)
>>
>> Including "0 +" means you're not estimating a random intercept for
>> "primary_ther", which it sounds like you need/want. Instead, you're
>> getting two random slopes, which I guess are perfectly correlated
>> because they're two sides of the same coin (the coin being your binary
>> dummy variable "white").
>>
>> If that doesn't solve the problem, it might help for you to post the
>> results of "str(rem3post)" (i.e., your dataset) as well.
>>
>> Cheers,
>> Malcolm
>>
>>
>>
>>
>> > Date: Tue, 21 May 2013 11:41:04 -0400
>> > From: Andrew McAleavey <andrew.mcaleavey at gmail.com>
>> > To: r-sig-mixed-models at r-project.org
>> > Subject: [R-sig-ME] categorical random effects correlation in lme4
>> >
>> > Hi,
>> >
>> > I'm currently investigating a question of relative effectiveness of
>> > therapists, and the particular question is whether some therapists are
>> > differentially effective with white versus racial/ethnic minority
>> > clients
>> > (this is coded as a binary variable called "white" in this data). We
>> > have
>> > conceptualized this as a cross-level random effect, so the model has one
>> > random effect for therapist intercept and one effect for the difference
>> > in
>> > effectiveness between their white and nonwhite clients.
>> >
>> > I am relatively new to lme4, but I think I have specified the model
>> > correctly (the fixed effects represent client pretreatment severity and
>> > the
>> > nonsignificant fixed effect of binary race; they don't seem to impact
>> > the
>> > estimation problem). Here's the model of interest:
>> >>print(fm1_ml <- lmer(DI ~ first_di + white + (0 +
>> > factor(white)|primary_ther), rem3post, REML=F), corr=F)
>> >
>> > The problem is that the two random effects are appearing to correlate at
>> > r
>> > = 1.000. I think this is an estimation problem, and probably indicates
>> > that
>> > the random variables aren't accounting for all that much variance. I'm
>> > dubious of interpreting this model, therefore. However, when comparing
>> > it
>> > to the random intercepts only model using the LRT, there is a
>> > significant
>> > difference, suggesting that even though the explained variance is (very)
>> > small, it may be worth including:
>> >> anova(fm1_a_ml, fm1_ml)
>> > Data: rem3post
>> > Models:
>> > fm1_a_ml: DI ~ first_di + factor(white) + (1 | primary_ther)
>> > fm1_ml: DI ~ first_di + factor(white) + (0 + factor(white) |
>> > primary_ther)
>> >                 Df    AIC      BIC      logLik     Chisq    Chi Df
>> > Pr(>Chisq)
>> > fm1_a_ml   5    4982.7  5011.3  -2486.3
>> > fm1_ml      7    4979.9   5019.9  -2482.9  6.7871      2        0.03359
>> > *
>> >
>> > My question is basically this: How should I interpret these results?
>> > There
>> > are significant differences between therapists in terms of their
>> > relative
>> > effectiveness with white vs. nonwhite clients, but they're just small?
>> > Or
>> > is even this not justified? Would it be safer to say that there are
>> > likely
>> > no estimable differences? Am I missing something else?
>> >
>> > Thanks a lot,
>> > Andrew McAleavey
>> >
>> > Here's the model of interest output:
>> >> print(fm1_ml <- lmer(DI ~ first_di + factor(white) + (0 +
>> > factor(white)|primary_ther), rem3post, REML=F), corr=F)
>> > Linear mixed model fit by maximum likelihood
>> > Formula: DI ~ first_di + factor(white) + (0 + factor(white) |
>> > primary_ther)
>> >    Data: rem3post
>> >   AIC  BIC logLik deviance REMLdev
>> >  4980 5020  -2483     4966    4983
>> > Random effects:
>> >  Groups       Name              Variance        Std.Dev.   Corr
>> >  primary_ther factor(white)    0 0.0417050  0.204218
>> >                  factor(white)1     0.0044086     0.066397   1.000
>> >  Residual                            0.5099596     0.714115
>> > Number of obs: 2263, groups: primary_ther, 192
>> >
>> > Fixed effects:
>> >                      Estimate Std. Error t value
>> > (Intercept)        0.45138    0.06007   7.514
>> > first_di             0.48009    0.02360   20.338
>> > factor(white)1   0.01140    0.03298   0.346
>> >
>> > --
>> > Andrew McAleavey, M.S.
>> > Department of Psychology
>> > The Pennsylvania State University
>> > 346 Moore Building
>> > University Park, PA 16802
>> > aam239 at psu.edu
>
>
>
>
> --
> Andrew McAleavey, M.S.
> Department of Psychology
> The Pennsylvania State University
> 346 Moore Building
> University Park, PA 16802
> aam239 at psu.edu