[R] lmer model building--include random effects?
Ista Zahn
istazahn at gmail.com
Wed Apr 23 15:35:04 CEST 2008
On Apr 23, 2008, at 8:56 AM, Douglas Bates wrote:
> On 4/22/08, Ista Zahn <istazahn at gmail.com> wrote:
>> Hello,
>> This is a follow up question to my previous one http://tolstoy.newcastle.edu.au/R/e4/help/08/02/3600.html
>
>> I am attempting to model relationship satisfaction (MAT) scores
>> (measurements at 5 time points), using participant (spouseID) and
>> couple id (ID) as grouping variables, and time (years) and conflict
>> (MCI.c) as predictors. I have been instructed to include random
>> effects for the slopes of both predictors as well as the intercepts,
>> and then to drop non-significant random effects from the model. The
>> instructor and the rest of the class is using HLM 6.0, which gives p-
>> values for random effects, and the procedure is simply to run a
>> model,
>> note which random effects are not significant, and drop them from the
>> model. I was hoping I could to something analogous by using the anova
>> function to compare models with and without a particular random
>> effect, but I get dramatically different results than those obtained
>> with HLM 6.0.
>
>> For example, I wanted to determine if I should include a random
>> effect
>> for the variable "MCI.c" (at the couple level), so I created two
>> models, one with and one without, and compared them:
>
>>> m.3 <- lmer(MAT ~ 1 + years + MCI.c + (1 + years | spouseID) + (1 +
>> years + MCI.c | ID), data=Data, method = "ML")
>>> m.1 <- lmer(MAT ~ 1 + years + MCI.c + (1 + years + MCI.c |
>> spouseID) + (1 + years + MCI.c | ID), data=Data, method = "ML")
>>> anova(m.1, m.3)
>> Data: Data
>> Models:
>> m.3: MAT ~ 1 + years + MCI.c + (1 + years | spouseID) + (1 + years +
>> m.1: MCI.c | ID)
>> m.3: MAT ~ 1 + years + MCI.c + (1 + years + MCI.c | spouseID) + (1 +
>> m.1: years + MCI.c | ID)
>> Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
>> m.3 12 5777.8 5832.7 -2876.9
>> m.1 15 5780.9 5849.5 -2875.4 2.9428 3 0.4005
>
>> The corresponding output from HLM 6.0 reads
>
>> Random Effect Standard Variance df Chi-
>> square P-value
>> Deviation Component
>
>> ------------------------------------------------------------------------------
>> INTRCPT1, R0 6.80961 46.37075 60
>> 112.80914 0.000
>> YEARS slope, R1 1.49329 2.22991 60 59.38729
>> >.500
>> MCI slope, R2 5.45608 29.76881 60
>> 90.57615 0.007
>> ------------------------------------------------------------------------------
>
>> To me, this seems to indicate that HLM 6.0 is suggesting that the
>> random effect should be included in the model, while R is suggesting
>> that it need not be. This is not (quite) a "why do I get different
>> results with X" post, but rather an "I'm worried that I might be
>> doing
>> something wrong" post. Does what I've done look reasonable? Is
>> there a
>> better way to go about it
>
> The first thing I would try to determine is whether the model fit by
> HLM is equivalent to the model you have fit with lmer. The part of
> the HLM model output you have shown lists only variance components.
> It does not provide covariances or correlations. The lmer model is
> fitting a 3 by 3 symmetric positive definite variance-covariance
> matrix with a total of 6 parameters - 3 variances and 3 covariances.
> It may be that HLM is fitting a simpler model in which the covariances
> are all zero.
>
Yes, I was also concerned that the model I fit in R may not be exactly
the model fit by HLM. The estimates are similar but not exact. The
model summaries from HLM and R are as follows:
HLM OUTPUT:
The outcome variable is MAT
The model specified for the fixed effects was:
----------------------------------------------------
Level-1 Level-2 Level-3
Coefficients Predictors Predictors
--------------------- --------------- ----------------
INTRCPT1, P0 INTRCPT2, B00 INTRCPT3, G000
YEARS slope, P1 INTRCPT2, B10 INTRCPT3, G100
* MCI slope, P2 INTRCPT2, B20 INTRCPT3, G200
'*' - This variable has been centered around its group mean
Summary of the model specified (in equation format)
---------------------------------------------------
Level-1 Model
Y = P0 + P1*(YEARS) + P2*(MCI) + E
Level-2 Model
P0 = B00 + R0
P1 = B10 + R1
P2 = B20 + R2
Level-3 Model
B00 = G000 + U00
B10 = G100 + U10
B20 = G200 + U20
Run-time deletion has reduced the number of level-1 records to 716
For starting values, data from 716 level-1 and 120 level-2 records
were used
Iterations stopped due to small change in likelihood function
******* ITERATION 1008 *******
Sigma_squared = 110.43050
Standard Error of Sigma_squared = 7.77797
Tau(pi)
INTRCPT1,P0 46.37075 5.48151 -5.91342
YEARS,P1 5.48151 2.22991 5.80536
MCI,P2 -5.91342 5.80536 29.76881
Tau(pi) (as correlations)
INTRCPT1,P0 1.000 0.539 -0.159
YEARS,P1 0.539 1.000 0.713
MCI,P2 -0.159 0.713 1.000
Standard Errors of Tau(pi)
INTRCPT1,P0 16.35658 6.70855 14.39441
YEARS,P1 6.70855 4.81811 8.57942
MCI,P2 14.39441 8.57942 22.49454
----------------------------------------------------
Random level-1 coefficient Reliability estimate
----------------------------------------------------
INTRCPT1, P0 0.482
YEARS, P1 0.074
MCI, P2 0.202
----------------------------------------------------
Tau(beta)
INTRCPT1 YEARS MCI
INTRCPT2,B00 INTRCPT2,B10 INTRCPT2,B20
116.72824 -0.09171 14.21000
-0.09171 2.81646 -0.17231
14.21000 -0.17231 1.90065
Tau(beta) (as correlations)
INTRCPT1/INTRCPT2,B00 1.000 -0.005 0.954
YEARS/INTRCPT2,B10 -0.005 1.000 -0.074
MCI/INTRCPT2,B20 0.954 -0.074 1.000
Standard Errors of Tau(beta)
INTRCPT1 YEARS MCI
INTRCPT2,B00 INTRCPT2,B10 INTRCPT2,B20
30.50218 7.45597 15.16454
7.45597 3.73945 6.40333
15.16454 6.40333 16.72669
----------------------------------------------------
Random level-2 coefficient Reliability estimate
----------------------------------------------------
INTRCPT1/INTRCPT2, B00 0.716
YEARS/INTRCPT2, B10 0.167
MCI/INTRCPT2, B20 0.027
----------------------------------------------------
The value of the likelihood function at iteration 1008 = -2.859327E+003
The outcome variable is MAT
Final estimation of fixed effects:
----------------------------------------------------------------------------
Standard Approx.
Fixed Effect Coefficient Error T-ratio d.f. P-
value
----------------------------------------------------------------------------
For INTRCPT1, P0
For INTRCPT2, B00
INTRCPT3, G000 124.486031 1.638650 75.969 59
0.000
For YEARS slope, P1
For INTRCPT2, B10
INTRCPT3, G100 -6.257696 0.518369 -12.072 59
0.000
For MCI slope, P2
For INTRCPT2, B20
INTRCPT3, G200 -3.698127 1.052597 -3.513 59
0.001
----------------------------------------------------------------------------
The outcome variable is MAT
Final estimation of fixed effects
(with robust standard errors)
----------------------------------------------------------------------------
Standard Approx.
Fixed Effect Coefficient Error T-ratio d.f. P-
value
----------------------------------------------------------------------------
For INTRCPT1, P0
For INTRCPT2, B00
INTRCPT3, G000 124.486031 1.632622 76.249 59
0.000
For YEARS slope, P1
For INTRCPT2, B10
INTRCPT3, G100 -6.257696 0.512071 -12.220 59
0.000
For MCI slope, P2
For INTRCPT2, B20
INTRCPT3, G200 -3.698127 1.003180 -3.686 59
0.001
----------------------------------------------------------------------------
Final estimation of level-1 and level-2 variance components:
------------------------------------------------------------------------------
Random Effect Standard Variance df Chi-
square P-value
Deviation Component
------------------------------------------------------------------------------
INTRCPT1, R0 6.80961 46.37075 60
112.80914 0.000
YEARS slope, R1 1.49329 2.22991 60
59.38729 >.500
MCI slope, R2 5.45608 29.76881 60
90.57615 0.007
level-1, E 10.50859 110.43050
------------------------------------------------------------------------------
Final estimation of level-3 variance components:
------------------------------------------------------------------------------
Random Effect Standard Variance df Chi-
square P-value
Deviation Component
------------------------------------------------------------------------------
INTRCPT1/INTRCPT2, U00 10.80408 116.72824 59
208.29109 0.000
YEARS/INTRCPT2, U10 1.67823 2.81646 59
75.45893 0.073
MCI/INTRCPT2, U20 1.37864 1.90065 59
64.47689 0.291
------------------------------------------------------------------------------
Statistics for current covariance components model
--------------------------------------------------
Deviance = 5718.653097
Number of estimated parameters = 16
R OUTPUT
> m.1 <- lmer(MAT ~ 1 + years + MCI.c + (1 + years + MCI.c |
spouseID) + (1 + years + MCI.c | ID), data=Data, method = "ML")
> summary(m.1)
Linear mixed-effects model fit by maximum likelihood
Formula: MAT ~ 1 + years + MCI.c + (1 + years + MCI.c | spouseID) + (1
+ years + MCI.c | ID)
Data: Data
AIC BIC logLik MLdeviance REMLdeviance
5781 5850 -2875 5751 5746
Random effects:
Groups Name Variance Std.Dev. Corr
spouseID (Intercept) 45.90014 6.77496
years 2.52945 1.59042 0.559
MCI.c 28.31849 5.32151 -0.082 0.781
ID (Intercept) 124.32049 11.14991
years 2.82449 1.68062 -0.218
MCI.c 0.20084 0.44815 0.140 -0.533
Residual 110.96358 10.53392
number of obs: 720, groups: spouseID, 120; ID, 60
Fixed effects:
Estimate Std. Error t value
(Intercept) 124.4506 1.6757 74.27
years -6.2569 0.5211 -12.01
MCI.c -3.5637 1.0228 -3.48
Correlation of Fixed Effects:
(Intr) years
years -0.251
MCI.c -0.152 0.567
The results do differ in places, but most of the estimates are similar
so I was assuming that the differences were due to different
underlying algorithms used by the two programs. I may well have been
wrong about this.
> The next question would be exactly how HLM is calculating that
> p-value. I wonder where the 60 degrees of freedom comes from. Do you
> happen to have 60 couples in the study?
Yes, there are 60 couples in the study. I believe HLM is calculating
what Raudenbush and Bryk (2002) call a "Univariate Chi-square" which
is what they recommend for testing single parameter variance
components. For multi parameter variance components they recommend a
likelihood ratio test, which is what I believe the method I used above
gives.
Thank you for taking the time to respond to my question,
Ista
More information about the R-help
mailing list