[R-sig-ME] very basic HLM question

Mon Feb 7 19:33:15 CET 2011

2011/2/7 Sebastián Daza <sebastian.daza at gmail.com>:
> Thank you Douglas.
> Do you know if there is a way to get the between-component variance (in my
> example equal to 59.004) through random effects Anova using R?

Why do you think the between-component variance is 59.004?  You are
not, as far as I can see, estimating the between-component variance.
It's not the same as the mean square, even for balanced data.

> I just want
> to get the ICC using the simple formula: ICC = variance between macro units
> / total variance.
>
> Regards.
>
> On 2/7/2011 11:44 AM, Douglas Bates wrote:
>>
>> 2011/2/7 Sebastián Daza<sebastian.daza at gmail.com>:
>>>
>>> Hi everyone,
>>> I need to get a between-component variance (e.g. random effects Anova) in
>>> order to compute by hand the intraclass correlation. However, using lmer
>>> I
>>> don't get the same results (variance component) than using random effects
>>> Anova (with SPSS). I am using a database of students, clustered on
>>> schools
>>> (there is not the same number of students by school).
>>
>> I think the issue is that the estimates of the variance components are
>> different, which would then lead to different estimates of the ICC.
>>
>> You mentioned in your original posting that the data are unbalanced
>> (i.e. there are different numbers of students in different schools).
>> The estimates returned from lmer are the REML estimates in this case.
>> Do you know how the other estimates are being calculated?  I'm not
>> sure that the naive calculation of within- and between- sums of
>> squares and equating expected mean squares with observed mean squares
>> works with unbalanced data.
>>
>>> First, I get the intraclass using ICC1 in R:
>>
>> As Paul indicated, it is more informative to say, "using ICC1 from the
>> multilevel package for R".  ICC1 is not part of the base R nor the
>> required packages.
>>
>>> summary(anova1<- aov(math ~ as.factor(schoolid), data=nels88))
>>> ICC1(anova1)
>>>
>>> [1] 0.4414491
>>
>> That is using the raw mean squares, not the estimates of the variance
>> components.  I have, thankfully, forgotten most of what I know about
>> expected and observed mean squares but I am pretty sure that those
>> don't correspond to the REML estimates, even in the balanced case.
>>
>> Consider the results from the Dyestuff data, which is part of the lme4
>> package and is balanced
>>
>>> print(fm1<- lmer(Yield ~ 1|Batch, Dyestuff))
>>
>> Linear mixed model fit by REML ['merMod']
>> Formula: Yield ~ 1 | Batch
>>    Data: Dyestuff
>> REML criterion at convergence: 319.6543
>>
>> Random effects:
>>  Groups   Name        Variance Std.Dev.
>>  Batch    (Intercept) 1764     42.00
>>  Residual             2451     49.51
>> Number of obs: 30, groups: Batch, 6
>>
>> Fixed effects:
>>             Estimate Std. Error t value
>> (Intercept)  1527.50      19.38    78.8
>>>
>>> VarCorr(fm1)
>>
>> $Batch
>>             (Intercept)
>> (Intercept)     1764.05
>> attr(,"stddev")
>> (Intercept)
>>     42.0006
>> attr(,"correlation")
>>             (Intercept)
>> (Intercept)           1
>>
>> attr(,"sc")
>> [1] 49.5101
>>>
>>> vc<- VarCorr(fm1)
>>> var1hat<- as.vector(vc$Batch)
>>> var1hat
>>
>> [1] 1764.05
>>>
>>> varhat<- attr(vc, "sc")^2
>>> varhat
>>
>> [1] 2451.25
>>>
>>> var1hat/(var1hat + varhat)
>>
>> [1] 0.4184874
>>>
>>> anova1<- aov(Yield ~ 1|Batch, Dyestuff)
>>
>> Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
>>   0 (non-NA) cases
>> In addition: Warning message:
>> In Ops.factor(1, Batch) : | not meaningful for factors
>>>
>>> anova1<- aov(Yield ~ Batch, Dyestuff)
>>> summary(anova1)
>>
>>             Df Sum Sq Mean Sq F value   Pr(>F)
>> Batch        5  56358 11271.5  4.5983 0.004398
>> Residuals   24  58830  2451.2
>>
>> You can see that the estimates of the residual variance are the same
>> for the two model fits but the mean square for batch is not the
>> estimate of the between-batch variance.
>>
>>
>>>
>>> ICC1 comes from the multilevel package. Using lmer I get:
>>>
>>> mod1<- lmer(math ~ 1 + (1|schoolid), data=nels88)
>>>
>>> Linear mixed model fit by REML
>>> Formula: math ~ 1 + (1 | schoolid)
>>>   Data: nels88
>>>  AIC  BIC logLik deviance REMLdev
>>>  1878 1888 -935.8     1875    1872
>>> Random effects:
>>>  Groups   Name        Variance Std.Dev.
>>>  schoolid (Intercept) 34.011   5.8319
>>>  Residual                72.256   8.5003
>>> Number of obs: 260, groups: schoolid, 10
>>>
>>> Fixed effects:
>>>            Estimate Std. Error t value
>>> (Intercept)   48.861      1.927   25.36
>>>
>>> The random intercept effect is 34.011. If I compute the ICC manually
>>> (according to some articles I have read) I get:
>>>
>>>> 34.011/(34.011+72.256)
>>>
>>> [1] 0.3200523
>>>
>>> According to my Anova analysis, the between-component variance should be
>>> 59.004. If I use 59.004 the formula works relatively well:
>>>
>>>  59.004/(59.004+72.256)
>>> [1] 0.449520037
>>>
>>> Not equal, but at least similar in comparison with ICC from ICC1 command.
>>> .
>>> Does anyone know why I have got those differences? How can I get the
>>> 59.004
>>> figure using R? I should get practically an identical result than using
>>> ICC1
>>> but I don't. I have used the database of that article and I have got an
>>> identical result between ICCs. But with the dataset I am using (nels88)
>>> doesn't work.
>>>
>>> This is the example of the article mentioned above:
>>>
>>> library(multilevel)
>>> base(bh1996)
>>>
>>> summary(lmer(WBEING ~ 1 + (1|GRP), data=bh1996))
>>>
>>> 0.035801/(0.035801+0.789497)
>>> 0.043379482
>>>
>>> summary(test<- aov(WBEING~as.factor(GRP),data=bh1996))
>>> ICC1(test)
>>> [1] 0.04336905
>>>
>>> Any suggestions?
>>>
>>> --
>>> Sebastián Daza
>>> sebastian.daza at gmail.com
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>
> --
> Sebastián Daza
> sebastian.daza at gmail.com
>