[R-sig-ME] Understanding variance components

Sun Jan 22 22:48:31 CET 2017

Thanks René for the comment. I'm still puzzled by the fact that the variance decomposition cannot seem to be directly reconciled through the two models themselves, and I hope that someone can offer a better way to interpret this.

Gang
________________________________
From: René [bimonosom at gmail.com]
Sent: Thursday, January 19, 2017 10:55 AM
To: Chen, Gang (NIH/NIMH) [C]
Cc: Henrik Singmann; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Understanding variance components

Hey Gang, hey Henrik (!) :)

very insightful query, hope it is okay that I briefly join it, on the last question.
Variance is a result of division of squared deviations by sample size. In the complete sample, this means divided by 240; and in the gender samples by 120 each.
Have a look on the reversed equations for obtaining the overall variance by combining gender groups, weighted by sample size:

(120 * 0.9944589   +   4.137316 * 120)/240 = overall variance (should be equal to)  = (0.9944589 + 4.137316)/2   acording to your equation gang
left hand can be written as:
120 * (0.9944589   +   4.137316) / 240
which is:
(0.9944589   +   4.137316)/2

so nothing to worry, I think :)

Best wishes,
René

2017-01-19 16:14 GMT+01:00 Chen, Gang (NIH/NIMH) [C] <gangchen at mail.nih.gov<mailto:gangchen at mail.nih.gov>>:
Nice, Henrik!

One thing we need to resolve, though, is this.

====================
The variances for model m1:

(1) intercept

summary(m1)$varcor$id[1]
[1] 2.565887

(2) residuals

attr(summary(m1)$varcor, "sc")^2
[1] 2.196212

====================
And the variances for model m2:

(1) female

summary(m2)$varcor$id[1]
[1] 0.9944589

(2) male

summary(m2)$varcor$id.1[1]
[1] 4.137316

(3) residuals

attr(summary(m2)$varcor, "sc")^2
[1] 2.196212
====================

It’s great to see that the residual variance matches between the two models m1 and m2. However, the intercept variance in m1, 2.565887, is not equal to the sum of female and male variances, 0.9944589 + 4.137316 = 5.131775. However, if we divide the total variance of female and male by 2, we have (0.9944589 + 4.137316)/2 = 2.565887. Why is that?

If we code the two groups as

obk.long$gender_F <- sqrt(2)*as.numeric(obk.long$gender == "F")
obk.long$gender_M <- sqrt(2)*as.numeric(obk.long$gender == "M”)

then we have the desired result,

m2 <- lmer(value ~ gender*phase+(0+gender_F|id)+(0+gender_M|id), data=obk.long)
summary(m2)$varcor$id[1]+summary(m2)$varcor$id.1[1]
[1] 2.565888

Even though the variance part is reconciled, I cannot come up with a good explanation as to why this coding strategy is required. Any thought?

Thanks,
Gang

On Jan 19, 2017, at 6:13 AM, Henrik Singmann <singmann at psychologie.uzh.ch<mailto:singmann at psychologie.uzh.ch><mailto:singmann at psychologie.uzh.ch<mailto:singmann at psychologie.uzh.ch>>> wrote:

Hi Gang,

I have an idea which is based on the last example given on ?lmer:
## Fit sex-specific variances by constructing numeric dummy variables
...

I am not sure if this is entirely correct, but it looks good to me. If not, hopefully someone more knowledgeable will jump in.

## Original model without gender specific variance:
data(obk.long, package = "afex")

m1 <- lmer(value ~ gender*phase+(1|id), data=obk.long)
summary(m1)$varcor
## Groups   Name        Std.Dev.
## id       (Intercept) 1.6018
## Residual             1.4820
REMLcrit(m1)
## [1] 911.1599

## to get gender specific vari8ances, we construct two dummy variables:
obk.long$gender_F <- as.numeric(obk.long$gender == "F")
obk.long$gender_M <- as.numeric(obk.long$gender == "M")
m2 <- lmer(value ~ gender*phase+(0+gender_F|id)+(0+gender_M|id), data=obk.long)
summary(m2)$varcor
## Groups   Name     Std.Dev.
## id       gender_F 0.99723
## id.1     gender_M 2.03404
## Residual          1.48196
REMLcrit(m2)
## [1] 908.297

So far, looks reasonably close. Same for the conditional modes (thx Phillip). Left two columns is separate, right is joint variance.
cbind(ranef(m2)$id, rep(NA, 16), ranef(m1)$id)
##      gender_F   gender_M rep(NA, 16) (Intercept)
## 1   0.0000000 -3.0986753          NA  -3.0351426
## 2   0.0000000 -2.1328544          NA  -2.0891242
## 3   0.0000000  0.1207276          NA   0.1182523
## 4  -0.9806229  0.0000000          NA  -1.0642708
## 5  -0.3995130  0.0000000          NA  -0.4335918
## 6   0.0000000  2.6962499          NA   2.6409683
## 7   0.0000000  1.4084888          NA   1.3796103
## 8  -0.3995130  0.0000000          NA  -0.4335918
## 9  -0.6900680  0.0000000          NA  -0.7489313
## 10  0.0000000  0.4426679          NA   0.4335918
## 11  0.0000000 -1.1670336          NA  -1.1431057
## 12  0.0000000  1.7304291          NA   1.6949498
## 13  1.3438166  0.0000000          NA   1.4584452
## 14 -0.6900680  0.0000000          NA  -0.7489313
## 15  0.4721518  0.0000000          NA   0.5124267
## 16  1.3438166  0.0000000          NA   1.4584452

And, finally, the same for the fixed effects:
fixef(m1)
##       (Intercept)           genderM         phasepost
##      6.000000e+00      7.500000e-01     -6.250000e-01
##          phasepre genderM:phasepost  genderM:phasepre
##     -2.000000e+00     -1.243450e-15     -1.687539e-15
fixef(m2)
##      (Intercept)           genderM         phasepost
##     6.000000e+00      7.500000e-01     -6.250000e-01
##         phasepre genderM:phasepost  genderM:phasepre
##    -2.000000e+00      1.829648e-14      1.820766e-14

Hope that helps,
Henrik

On Jan 18, 2017, at 5:18 PM, Chen, Gang (NIH/NIMH) [C] <gangchen at mail.nih.gov<mailto:gangchen at mail.nih.gov><mailto:gangchen at mail.nih.gov<mailto:gangchen at mail.nih.gov>>> wrote:

Happy New Year, Henrik! Thanks for explaining the details. A couple of days after I posted the question, I realized that my question was silly! Once I laid out the LME model equation, my original confusion was resolved.

Actually I meant to ask a slightly different question. Let me use the dataset embedded in your ‘afex’ package as an example:

data(obk.long, package = "afex”)

Suppose that my base model is

lmer(value ~ gender*phase+(1|id), data=obk.long)

Is there a way to specify a different variance for each gender in one model?

Thanks,
Gang

On Jan 13, 2017, at 12:06 PM, Henrik Singmann <singmann at psychologie.uzh.ch<mailto:singmann at psychologie.uzh.ch><mailto:singmann at psychologie.uzh.ch<mailto:singmann at psychologie.uzh.ch>>> wrote:

Hi Gang,

Sorry that I so am late to the party, but in case you are still interested I will reply (and, of course, for the archive).

The answer is basically given in the old faq:
http://glmm.wikidot.com/faq#toc27

(1|site/block) = (1|site)+(1|site:block)

Which is exactly what is given in your output. A random intercept for Worker and a random intercept for each worker:Machine interaction.

To answer your questions. The random intercepts do not have base or reference levels. They are increments or decrements to the overall intercept for each level of Worker or the Machine:Worker combination. The reported variance is the estimated variance of these increments, which is most likely unequal to the actual variance you would obtain by calculating it from the estimated increments, which are sometimes called BLUPs (I wonder if a better term for those exist).

Hope that helps,
Henrik

PS: Belated Happy New Year to everyone.

Am 05.01.2017 um 17:28 schrieb Chen, Gang (NIH/NIMH) [C]:
Suppose that I have the following dataset in R:

library(lme4)
data(Machines,package="nlme")
mydata <- Machines[Machines$Machine!='C’,]

With the following model:

print(lmer(score ~ 1 + (1|Worker/Machine), data=mydata), ranef.comp="Var")

I have the variance components as shown below:

Random effects:
Groups         Name        Variance
Machine:Worker (Intercept) 46.00
Worker         (Intercept) 13.84
Residual                    1.16

I have trouble understanding exactly what the first two components are: Machine:Worker and Worker? Specifically,

1) What is the variance for Worker: corresponding to the base (or reference) level of the factor Machine? If so, what is the base level: the first level in the dataset or alphabetically the first level (it happens to be the same in this particular dataset)?

2) What is the variance for Machine:Worker? Is it the variance for the second level of the factor Machine, or the extra variance relative to the variance for Worker?

Furthermore, for the model:

print(lmer(score ~ 1 + (1|Worker/Machine), data=Machines), ranef.comp="Var")

what is the variance for Machine:Worker in the following result since there are 3 levels involved in the factor Machine?

Random effects:
Groups         Name        Variance
Machine:Worker (Intercept) 60.2972
Worker         (Intercept)  7.3959
Residual                    0.9246

Thanks,
Gang
_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org><mailto:R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org>> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

	[[alternative HTML version deleted]]