[R-sig-ME] very basic HLM question

Mon Feb 7 19:44:22 CET 2011

2011/2/7 Sebastián Daza <sebastian.daza at gmail.com>:
> I have run a one-way anova random effect in SPSS and I get that the
> between-component variance is 59.004. With this figure I can get the same
> ICC that I obtained using ICC1. I haven't found a way to get that
> between-component variance using R.

I don't know how to reproduce the SPSS results in R.

> On 2/7/2011 12:33 PM, Douglas Bates wrote:
>>
>> 2011/2/7 Sebastián Daza<sebastian.daza at gmail.com>:
>>>
>>> Thank you Douglas.
>>> Do you know if there is a way to get the between-component variance (in
>>> my
>>> example equal to 59.004) through random effects Anova using R?
>>
>> Why do you think the between-component variance is 59.004?  You are
>> not, as far as I can see, estimating the between-component variance.
>> It's not the same as the mean square, even for balanced data.
>>
>>> I just want
>>> to get the ICC using the simple formula: ICC = variance between macro
>>> units
>>> / total variance.
>>>
>>> Regards.
>>>
>>> On 2/7/2011 11:44 AM, Douglas Bates wrote:
>>>>
>>>> 2011/2/7 Sebastián Daza<sebastian.daza at gmail.com>:
>>>>>
>>>>> Hi everyone,
>>>>> I need to get a between-component variance (e.g. random effects Anova)
>>>>> in
>>>>> order to compute by hand the intraclass correlation. However, using
>>>>> lmer
>>>>> I
>>>>> don't get the same results (variance component) than using random
>>>>> effects
>>>>> Anova (with SPSS). I am using a database of students, clustered on
>>>>> schools
>>>>> (there is not the same number of students by school).
>>>>
>>>> I think the issue is that the estimates of the variance components are
>>>> different, which would then lead to different estimates of the ICC.
>>>>
>>>> You mentioned in your original posting that the data are unbalanced
>>>> (i.e. there are different numbers of students in different schools).
>>>> The estimates returned from lmer are the REML estimates in this case.
>>>> Do you know how the other estimates are being calculated?  I'm not
>>>> sure that the naive calculation of within- and between- sums of
>>>> squares and equating expected mean squares with observed mean squares
>>>> works with unbalanced data.
>>>>
>>>>> First, I get the intraclass using ICC1 in R:
>>>>
>>>> As Paul indicated, it is more informative to say, "using ICC1 from the
>>>> multilevel package for R".  ICC1 is not part of the base R nor the
>>>> required packages.
>>>>
>>>>> summary(anova1<- aov(math ~ as.factor(schoolid), data=nels88))
>>>>> ICC1(anova1)
>>>>>
>>>>> [1] 0.4414491
>>>>
>>>> That is using the raw mean squares, not the estimates of the variance
>>>> components.  I have, thankfully, forgotten most of what I know about
>>>> expected and observed mean squares but I am pretty sure that those
>>>> don't correspond to the REML estimates, even in the balanced case.
>>>>
>>>> Consider the results from the Dyestuff data, which is part of the lme4
>>>> package and is balanced
>>>>
>>>>> print(fm1<- lmer(Yield ~ 1|Batch, Dyestuff))
>>>>
>>>> Linear mixed model fit by REML ['merMod']
>>>> Formula: Yield ~ 1 | Batch
>>>>    Data: Dyestuff
>>>> REML criterion at convergence: 319.6543
>>>>
>>>> Random effects:
>>>>  Groups   Name        Variance Std.Dev.
>>>>  Batch    (Intercept) 1764     42.00
>>>>  Residual             2451     49.51
>>>> Number of obs: 30, groups: Batch, 6
>>>>
>>>> Fixed effects:
>>>>             Estimate Std. Error t value
>>>> (Intercept)  1527.50      19.38    78.8
>>>>>
>>>>> VarCorr(fm1)
>>>>
>>>> $Batch
>>>>             (Intercept)
>>>> (Intercept)     1764.05
>>>> attr(,"stddev")
>>>> (Intercept)
>>>>     42.0006
>>>> attr(,"correlation")
>>>>             (Intercept)
>>>> (Intercept)           1
>>>>
>>>> attr(,"sc")
>>>> [1] 49.5101
>>>>>
>>>>> vc<- VarCorr(fm1)
>>>>> var1hat<- as.vector(vc$Batch)
>>>>> var1hat
>>>>
>>>> [1] 1764.05
>>>>>
>>>>> varhat<- attr(vc, "sc")^2
>>>>> varhat
>>>>
>>>> [1] 2451.25
>>>>>
>>>>> var1hat/(var1hat + varhat)
>>>>
>>>> [1] 0.4184874
>>>>>
>>>>> anova1<- aov(Yield ~ 1|Batch, Dyestuff)
>>>>
>>>> Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
>>>>   0 (non-NA) cases
>>>> In addition: Warning message:
>>>> In Ops.factor(1, Batch) : | not meaningful for factors
>>>>>
>>>>> anova1<- aov(Yield ~ Batch, Dyestuff)
>>>>> summary(anova1)
>>>>
>>>>             Df Sum Sq Mean Sq F value   Pr(>F)
>>>> Batch        5  56358 11271.5  4.5983 0.004398
>>>> Residuals   24  58830  2451.2
>>>>
>>>> You can see that the estimates of the residual variance are the same
>>>> for the two model fits but the mean square for batch is not the
>>>> estimate of the between-batch variance.
>>>>
>>>>
>>>>>
>>>>> ICC1 comes from the multilevel package. Using lmer I get:
>>>>>
>>>>> mod1<- lmer(math ~ 1 + (1|schoolid), data=nels88)
>>>>>
>>>>> Linear mixed model fit by REML
>>>>> Formula: math ~ 1 + (1 | schoolid)
>>>>>   Data: nels88
>>>>>  AIC  BIC logLik deviance REMLdev
>>>>>  1878 1888 -935.8     1875    1872
>>>>> Random effects:
>>>>>  Groups   Name        Variance Std.Dev.
>>>>>  schoolid (Intercept) 34.011   5.8319
>>>>>  Residual                72.256   8.5003
>>>>> Number of obs: 260, groups: schoolid, 10
>>>>>
>>>>> Fixed effects:
>>>>>            Estimate Std. Error t value
>>>>> (Intercept)   48.861      1.927   25.36
>>>>>
>>>>> The random intercept effect is 34.011. If I compute the ICC manually
>>>>> (according to some articles I have read) I get:
>>>>>
>>>>>> 34.011/(34.011+72.256)
>>>>>
>>>>> [1] 0.3200523
>>>>>
>>>>> According to my Anova analysis, the between-component variance should
>>>>> be
>>>>> 59.004. If I use 59.004 the formula works relatively well:
>>>>>
>>>>>  59.004/(59.004+72.256)
>>>>> [1] 0.449520037
>>>>>
>>>>> Not equal, but at least similar in comparison with ICC from ICC1
>>>>> command.
>>>>> .
>>>>> Does anyone know why I have got those differences? How can I get the
>>>>> 59.004
>>>>> figure using R? I should get practically an identical result than using
>>>>> ICC1
>>>>> but I don't. I have used the database of that article and I have got an
>>>>> identical result between ICCs. But with the dataset I am using (nels88)
>>>>> doesn't work.
>>>>>
>>>>> This is the example of the article mentioned above:
>>>>>
>>>>> library(multilevel)
>>>>> base(bh1996)
>>>>>
>>>>> summary(lmer(WBEING ~ 1 + (1|GRP), data=bh1996))
>>>>>
>>>>> 0.035801/(0.035801+0.789497)
>>>>> 0.043379482
>>>>>
>>>>> summary(test<- aov(WBEING~as.factor(GRP),data=bh1996))
>>>>> ICC1(test)
>>>>> [1] 0.04336905
>>>>>
>>>>> Any suggestions?
>>>>>
>>>>> --
>>>>> Sebastián Daza
>>>>> sebastian.daza at gmail.com
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>>>>
>>>
>>> --
>>> Sebastián Daza
>>> sebastian.daza at gmail.com
>>>
>>
>
> --
> Sebastián Daza
> sebastian.daza at gmail.com
>