[R-sig-ME] Understanding variance components
Chen, Gang (NIH/NIMH) [C]
gangchen at mail.nih.gov
Sun Jan 22 22:48:31 CET 2017
Thanks René for the comment. I'm still puzzled by the fact that the variance decomposition cannot seem to be directly reconciled through the two models themselves, and I hope that someone can offer a better way to interpret this.
Gang
________________________________
From: René [bimonosom at gmail.com]
Sent: Thursday, January 19, 2017 10:55 AM
To: Chen, Gang (NIH/NIMH) [C]
Cc: Henrik Singmann; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Understanding variance components
Hey Gang, hey Henrik (!) :)
very insightful query, hope it is okay that I briefly join it, on the last question.
Variance is a result of division of squared deviations by sample size. In the complete sample, this means divided by 240; and in the gender samples by 120 each.
Have a look on the reversed equations for obtaining the overall variance by combining gender groups, weighted by sample size:
(120 * 0.9944589 + 4.137316 * 120)/240 = overall variance (should be equal to) = (0.9944589 + 4.137316)/2 acording to your equation gang
left hand can be written as:
120 * (0.9944589 + 4.137316) / 240
which is:
(0.9944589 + 4.137316)/2
so nothing to worry, I think :)
Best wishes,
René
2017-01-19 16:14 GMT+01:00 Chen, Gang (NIH/NIMH) [C] <gangchen at mail.nih.gov<mailto:gangchen at mail.nih.gov>>:
Nice, Henrik!
One thing we need to resolve, though, is this.
====================
The variances for model m1:
(1) intercept
summary(m1)$varcor$id[1]
[1] 2.565887
(2) residuals
attr(summary(m1)$varcor, "sc")^2
[1] 2.196212
====================
And the variances for model m2:
(1) female
summary(m2)$varcor$id[1]
[1] 0.9944589
(2) male
summary(m2)$varcor$id.1[1]
[1] 4.137316
(3) residuals
attr(summary(m2)$varcor, "sc")^2
[1] 2.196212
====================
It’s great to see that the residual variance matches between the two models m1 and m2. However, the intercept variance in m1, 2.565887, is not equal to the sum of female and male variances, 0.9944589 + 4.137316 = 5.131775. However, if we divide the total variance of female and male by 2, we have (0.9944589 + 4.137316)/2 = 2.565887. Why is that?
If we code the two groups as
obk.long$gender_F <- sqrt(2)*as.numeric(obk.long$gender == "F")
obk.long$gender_M <- sqrt(2)*as.numeric(obk.long$gender == "M”)
then we have the desired result,
m2 <- lmer(value ~ gender*phase+(0+gender_F|id)+(0+gender_M|id), data=obk.long)
summary(m2)$varcor$id[1]+summary(m2)$varcor$id.1[1]
[1] 2.565888
Even though the variance part is reconciled, I cannot come up with a good explanation as to why this coding strategy is required. Any thought?
Thanks,
Gang
On Jan 19, 2017, at 6:13 AM, Henrik Singmann <singmann at psychologie.uzh.ch<mailto:singmann at psychologie.uzh.ch><mailto:singmann at psychologie.uzh.ch<mailto:singmann at psychologie.uzh.ch>>> wrote:
Hi Gang,
I have an idea which is based on the last example given on ?lmer:
## Fit sex-specific variances by constructing numeric dummy variables
...
I am not sure if this is entirely correct, but it looks good to me. If not, hopefully someone more knowledgeable will jump in.
## Original model without gender specific variance:
data(obk.long, package = "afex")
m1 <- lmer(value ~ gender*phase+(1|id), data=obk.long)
summary(m1)$varcor
## Groups Name Std.Dev.
## id (Intercept) 1.6018
## Residual 1.4820
REMLcrit(m1)
## [1] 911.1599
## to get gender specific vari8ances, we construct two dummy variables:
obk.long$gender_F <- as.numeric(obk.long$gender == "F")
obk.long$gender_M <- as.numeric(obk.long$gender == "M")
m2 <- lmer(value ~ gender*phase+(0+gender_F|id)+(0+gender_M|id), data=obk.long)
summary(m2)$varcor
## Groups Name Std.Dev.
## id gender_F 0.99723
## id.1 gender_M 2.03404
## Residual 1.48196
REMLcrit(m2)
## [1] 908.297
So far, looks reasonably close. Same for the conditional modes (thx Phillip). Left two columns is separate, right is joint variance.
cbind(ranef(m2)$id, rep(NA, 16), ranef(m1)$id)
## gender_F gender_M rep(NA, 16) (Intercept)
## 1 0.0000000 -3.0986753 NA -3.0351426
## 2 0.0000000 -2.1328544 NA -2.0891242
## 3 0.0000000 0.1207276 NA 0.1182523
## 4 -0.9806229 0.0000000 NA -1.0642708
## 5 -0.3995130 0.0000000 NA -0.4335918
## 6 0.0000000 2.6962499 NA 2.6409683
## 7 0.0000000 1.4084888 NA 1.3796103
## 8 -0.3995130 0.0000000 NA -0.4335918
## 9 -0.6900680 0.0000000 NA -0.7489313
## 10 0.0000000 0.4426679 NA 0.4335918
## 11 0.0000000 -1.1670336 NA -1.1431057
## 12 0.0000000 1.7304291 NA 1.6949498
## 13 1.3438166 0.0000000 NA 1.4584452
## 14 -0.6900680 0.0000000 NA -0.7489313
## 15 0.4721518 0.0000000 NA 0.5124267
## 16 1.3438166 0.0000000 NA 1.4584452
And, finally, the same for the fixed effects:
fixef(m1)
## (Intercept) genderM phasepost
## 6.000000e+00 7.500000e-01 -6.250000e-01
## phasepre genderM:phasepost genderM:phasepre
## -2.000000e+00 -1.243450e-15 -1.687539e-15
fixef(m2)
## (Intercept) genderM phasepost
## 6.000000e+00 7.500000e-01 -6.250000e-01
## phasepre genderM:phasepost genderM:phasepre
## -2.000000e+00 1.829648e-14 1.820766e-14
Hope that helps,
Henrik
On Jan 18, 2017, at 5:18 PM, Chen, Gang (NIH/NIMH) [C] <gangchen at mail.nih.gov<mailto:gangchen at mail.nih.gov><mailto:gangchen at mail.nih.gov<mailto:gangchen at mail.nih.gov>>> wrote:
Happy New Year, Henrik! Thanks for explaining the details. A couple of days after I posted the question, I realized that my question was silly! Once I laid out the LME model equation, my original confusion was resolved.
Actually I meant to ask a slightly different question. Let me use the dataset embedded in your ‘afex’ package as an example:
data(obk.long, package = "afex”)
Suppose that my base model is
lmer(value ~ gender*phase+(1|id), data=obk.long)
Is there a way to specify a different variance for each gender in one model?
Thanks,
Gang
On Jan 13, 2017, at 12:06 PM, Henrik Singmann <singmann at psychologie.uzh.ch<mailto:singmann at psychologie.uzh.ch><mailto:singmann at psychologie.uzh.ch<mailto:singmann at psychologie.uzh.ch>>> wrote:
Hi Gang,
Sorry that I so am late to the party, but in case you are still interested I will reply (and, of course, for the archive).
The answer is basically given in the old faq:
http://glmm.wikidot.com/faq#toc27
(1|site/block) = (1|site)+(1|site:block)
Which is exactly what is given in your output. A random intercept for Worker and a random intercept for each worker:Machine interaction.
To answer your questions. The random intercepts do not have base or reference levels. They are increments or decrements to the overall intercept for each level of Worker or the Machine:Worker combination. The reported variance is the estimated variance of these increments, which is most likely unequal to the actual variance you would obtain by calculating it from the estimated increments, which are sometimes called BLUPs (I wonder if a better term for those exist).
Hope that helps,
Henrik
PS: Belated Happy New Year to everyone.
Am 05.01.2017 um 17:28 schrieb Chen, Gang (NIH/NIMH) [C]:
Suppose that I have the following dataset in R:
library(lme4)
data(Machines,package="nlme")
mydata <- Machines[Machines$Machine!='C’,]
With the following model:
print(lmer(score ~ 1 + (1|Worker/Machine), data=mydata), ranef.comp="Var")
I have the variance components as shown below:
Random effects:
Groups Name Variance
Machine:Worker (Intercept) 46.00
Worker (Intercept) 13.84
Residual 1.16
I have trouble understanding exactly what the first two components are: Machine:Worker and Worker? Specifically,
1) What is the variance for Worker: corresponding to the base (or reference) level of the factor Machine? If so, what is the base level: the first level in the dataset or alphabetically the first level (it happens to be the same in this particular dataset)?
2) What is the variance for Machine:Worker? Is it the variance for the second level of the factor Machine, or the extra variance relative to the variance for Worker?
Furthermore, for the model:
print(lmer(score ~ 1 + (1|Worker/Machine), data=Machines), ranef.comp="Var")
what is the variance for Machine:Worker in the following result since there are 3 levels involved in the factor Machine?
Random effects:
Groups Name Variance
Machine:Worker (Intercept) 60.2972
Worker (Intercept) 7.3959
Residual 0.9246
Thanks,
Gang
_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org><mailto:R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org>> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list