[R-sig-ME] Understanding variance components

Thu Jan 19 16:55:00 CET 2017

Hey Gang, hey Henrik (!) :)

very insightful query, hope it is okay that I briefly join it, on the last
question.
Variance is a result of division of squared deviations by sample size. In
the complete sample, this means divided by 240; and in the gender samples
by 120 each.
Have a look on the reversed equations for obtaining the overall variance by
combining gender groups, weighted by sample size:

(120 * 0.9944589   +   4.137316 * 120)/240 = overall variance (should be
equal to)  = (0.9944589 + 4.137316)/2   acording to your equation gang
left hand can be written as:
120 * (0.9944589   +   4.137316) / 240
which is:
(0.9944589   +   4.137316)/2

so nothing to worry, I think :)

Best wishes,
René

2017-01-19 16:14 GMT+01:00 Chen, Gang (NIH/NIMH) [C] <gangchen at mail.nih.gov>
:

> Nice, Henrik!
>
> One thing we need to resolve, though, is this.
>
> ====================
> The variances for model m1:
>
> (1) intercept
>
> summary(m1)$varcor$id[1]
> [1] 2.565887
>
> (2) residuals
>
> attr(summary(m1)$varcor, "sc")^2
> [1] 2.196212
>
> ====================
> And the variances for model m2:
>
> (1) female
>
> summary(m2)$varcor$id[1]
> [1] 0.9944589
>
> (2) male
>
> summary(m2)$varcor$id.1[1]
> [1] 4.137316
>
> (3) residuals
>
> attr(summary(m2)$varcor, "sc")^2
> [1] 2.196212
> ====================
>
> It’s great to see that the residual variance matches between the two
> models m1 and m2. However, the intercept variance in m1, 2.565887, is not
> equal to the sum of female and male variances, 0.9944589 + 4.137316 =
> 5.131775. However, if we divide the total variance of female and male by 2,
> we have (0.9944589 + 4.137316)/2 = 2.565887. Why is that?
>
> If we code the two groups as
>
> obk.long$gender_F <- sqrt(2)*as.numeric(obk.long$gender == "F")
> obk.long$gender_M <- sqrt(2)*as.numeric(obk.long$gender == "M”)
>
> then we have the desired result,
>
> m2 <- lmer(value ~ gender*phase+(0+gender_F|id)+(0+gender_M|id),
> data=obk.long)
> summary(m2)$varcor$id[1]+summary(m2)$varcor$id.1[1]
> [1] 2.565888
>
> Even though the variance part is reconciled, I cannot come up with a good
> explanation as to why this coding strategy is required. Any thought?
>
> Thanks,
> Gang
>
>
> On Jan 19, 2017, at 6:13 AM, Henrik Singmann <singmann at psychologie.uzh.ch<
> mailto:singmann at psychologie.uzh.ch>> wrote:
>
> Hi Gang,
>
> I have an idea which is based on the last example given on ?lmer:
> ## Fit sex-specific variances by constructing numeric dummy variables
> ...
>
> I am not sure if this is entirely correct, but it looks good to me. If
> not, hopefully someone more knowledgeable will jump in.
>
> ## Original model without gender specific variance:
> data(obk.long, package = "afex")
>
> m1 <- lmer(value ~ gender*phase+(1|id), data=obk.long)
> summary(m1)$varcor
> ## Groups   Name        Std.Dev.
> ## id       (Intercept) 1.6018
> ## Residual             1.4820
> REMLcrit(m1)
> ## [1] 911.1599
>
> ## to get gender specific vari8ances, we construct two dummy variables:
> obk.long$gender_F <- as.numeric(obk.long$gender == "F")
> obk.long$gender_M <- as.numeric(obk.long$gender == "M")
> m2 <- lmer(value ~ gender*phase+(0+gender_F|id)+(0+gender_M|id),
> data=obk.long)
> summary(m2)$varcor
> ## Groups   Name     Std.Dev.
> ## id       gender_F 0.99723
> ## id.1     gender_M 2.03404
> ## Residual          1.48196
> REMLcrit(m2)
> ## [1] 908.297
>
> So far, looks reasonably close. Same for the conditional modes (thx
> Phillip). Left two columns is separate, right is joint variance.
> cbind(ranef(m2)$id, rep(NA, 16), ranef(m1)$id)
> ##      gender_F   gender_M rep(NA, 16) (Intercept)
> ## 1   0.0000000 -3.0986753          NA  -3.0351426
> ## 2   0.0000000 -2.1328544          NA  -2.0891242
> ## 3   0.0000000  0.1207276          NA   0.1182523
> ## 4  -0.9806229  0.0000000          NA  -1.0642708
> ## 5  -0.3995130  0.0000000          NA  -0.4335918
> ## 6   0.0000000  2.6962499          NA   2.6409683
> ## 7   0.0000000  1.4084888          NA   1.3796103
> ## 8  -0.3995130  0.0000000          NA  -0.4335918
> ## 9  -0.6900680  0.0000000          NA  -0.7489313
> ## 10  0.0000000  0.4426679          NA   0.4335918
> ## 11  0.0000000 -1.1670336          NA  -1.1431057
> ## 12  0.0000000  1.7304291          NA   1.6949498
> ## 13  1.3438166  0.0000000          NA   1.4584452
> ## 14 -0.6900680  0.0000000          NA  -0.7489313
> ## 15  0.4721518  0.0000000          NA   0.5124267
> ## 16  1.3438166  0.0000000          NA   1.4584452
>
> And, finally, the same for the fixed effects:
> fixef(m1)
> ##       (Intercept)           genderM         phasepost
> ##      6.000000e+00      7.500000e-01     -6.250000e-01
> ##          phasepre genderM:phasepost  genderM:phasepre
> ##     -2.000000e+00     -1.243450e-15     -1.687539e-15
> fixef(m2)
> ##      (Intercept)           genderM         phasepost
> ##     6.000000e+00      7.500000e-01     -6.250000e-01
> ##         phasepre genderM:phasepost  genderM:phasepre
> ##    -2.000000e+00      1.829648e-14      1.820766e-14
>
> Hope that helps,
> Henrik
>
> On Jan 18, 2017, at 5:18 PM, Chen, Gang (NIH/NIMH) [C] <
> gangchen at mail.nih.gov<mailto:gangchen at mail.nih.gov>> wrote:
>
> Happy New Year, Henrik! Thanks for explaining the details. A couple of
> days after I posted the question, I realized that my question was silly!
> Once I laid out the LME model equation, my original confusion was resolved.
>
> Actually I meant to ask a slightly different question. Let me use the
> dataset embedded in your ‘afex’ package as an example:
>
> data(obk.long, package = "afex”)
>
> Suppose that my base model is
>
> lmer(value ~ gender*phase+(1|id), data=obk.long)
>
> Is there a way to specify a different variance for each gender in one
> model?
>
> Thanks,
> Gang
>
>
> On Jan 13, 2017, at 12:06 PM, Henrik Singmann <singmann at psychologie.uzh.ch
> <mailto:singmann at psychologie.uzh.ch>> wrote:
>
> Hi Gang,
>
> Sorry that I so am late to the party, but in case you are still interested
> I will reply (and, of course, for the archive).
>
> The answer is basically given in the old faq:
> http://glmm.wikidot.com/faq#toc27
>
> (1|site/block) = (1|site)+(1|site:block)
>
> Which is exactly what is given in your output. A random intercept for
> Worker and a random intercept for each worker:Machine interaction.
>
> To answer your questions. The random intercepts do not have base or
> reference levels. They are increments or decrements to the overall
> intercept for each level of Worker or the Machine:Worker combination. The
> reported variance is the estimated variance of these increments, which is
> most likely unequal to the actual variance you would obtain by calculating
> it from the estimated increments, which are sometimes called BLUPs (I
> wonder if a better term for those exist).
>
> Hope that helps,
> Henrik
>
> PS: Belated Happy New Year to everyone.
>
>
> Am 05.01.2017 um 17:28 schrieb Chen, Gang (NIH/NIMH) [C]:
> Suppose that I have the following dataset in R:
>
> library(lme4)
> data(Machines,package="nlme")
> mydata <- Machines[Machines$Machine!='C’,]
>
> With the following model:
>
> print(lmer(score ~ 1 + (1|Worker/Machine), data=mydata), ranef.comp="Var")
>
> I have the variance components as shown below:
>
> Random effects:
> Groups         Name        Variance
> Machine:Worker (Intercept) 46.00
> Worker         (Intercept) 13.84
> Residual                    1.16
>
> I have trouble understanding exactly what the first two components are:
> Machine:Worker and Worker? Specifically,
>
> 1) What is the variance for Worker: corresponding to the base (or
> reference) level of the factor Machine? If so, what is the base level: the
> first level in the dataset or alphabetically the first level (it happens to
> be the same in this particular dataset)?
>
> 2) What is the variance for Machine:Worker? Is it the variance for the
> second level of the factor Machine, or the extra variance relative to the
> variance for Worker?
>
> Furthermore, for the model:
>
> print(lmer(score ~ 1 + (1|Worker/Machine), data=Machines),
> ranef.comp="Var")
>
> what is the variance for Machine:Worker in the following result since
> there are 3 levels involved in the factor Machine?
>
> Random effects:
> Groups         Name        Variance
> Machine:Worker (Intercept) 60.2972
> Worker         (Intercept)  7.3959
> Residual                    0.9246
>
> Thanks,
> Gang
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]