[R-sig-ME] Standard Error of a coef. in a 2-level model vs. 2 OLS models

Thu Sep 17 19:37:54 CEST 2020

Thanks Phillip!

In other words, when we use the full data in a regular ols model (`ols1`
below), then the residual variation [i.e.,  sigma(ols1)^2] approximately
equals the sum of within- and between-cluster variation from a
corresponding mixed model (`m1` below).

But if we use cluster-level data in a regular ols model (`ols2` below), no
correspondence between the amounts of variation between the `ols2` model
and the corresponding mixed model (`m1` below) can be found?

library(lme4)
library(tidyverse)

hsb <- read.csv('
https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
hsb_ave <- hsb %>% group_by(sch.id) %>% mutate(math_ave = mean(math)) %>%
slice(1)

ols1 <- lm(math ~ sector, data = hsb)
summary(ols1)

m1 <- lmer(math ~ sector + (1|sch.id), data = hsb)
summary(m1)

ols2 <- lm(math_ave ~ sector, data = hsb_ave)
summary(ols2)

On Thu, Sep 17, 2020 at 12:23 PM Phillip Alday <phillip.alday using mpi.nl> wrote:

> The answer is no.
>
> Consider the case where the between-cluster variation goes to zero (i.e.
> a singular fit), but the residual variation is not zero. The
> between-cluster variation is clearly not the same as the "total
> variation" / residual variation in a non mixed OLS regression on the
> data aggregated within clusters.
>
> The other way to see this is that lme4 actually uses the scaled random
> effects, i.e. the variance relative to the residual variance, for
> fitting. (This what the entries in the theta vector reflect: the lower
> Cholesky factor of the variance-covariance matrix of the scaled random
> effects.)
>
> Best,
>
> Phillip
>
> On 17/09/2020 19:15, Simon Harmel wrote:
> > Thanks Harold! I may have been insufficiently clear.
> >
> > At its core, the question is asking: "Is the "between-cluster" variation
> > (i.e., τ00) in a two-level nested mixed-effects model (i.e., `m1`), the
> > same as the amount of "total variation" that we would otherwise get if we
> > only use the cluster-level data in a corresponding, NON-HLM regression
> > model (i.e., `ols2`)?"
> >
> >
> > library(lme4)
> > library(tidyverse)
> >
> > hsb <- read.csv('
> > https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
> > hsb_ave <- hsb %>% group_by(sch.id) %>% mutate(math_ave = mean(math))
> %>%
> > slice(1)
> >
> > ols2 <- lm(math_ave ~ sector, data = hsb_ave)
> >
> > m1 <- lmer(math ~ sector + (1|sch.id), data = hsb)
> >
> > On Thu, Sep 17, 2020 at 11:16 AM Harold Doran <
> > harold.doran using cambiumassessment.com> wrote:
> >
> >> Simon
> >>
> >>
> >>
> >> Crossposting like this is frowned on a bit, but I’ve been in your shoes
> >> trying to get an answer before. I think you might be a bit confused. I
> saw
> >> your questions in both places and you’re asking how to get the standard
> >> errors of the OLS model using by fitting a mixed model and using the
> >> variance components from that mixed model to get the standard errors
> from
> >> an OLS model.
> >>
> >>
> >>
> >> This is what we refer to in statistics as “bass-ackwards”. 😊
> >>
> >>
> >>
> >> If you want the standard errors of the fixed effects from an OLS model,
> >> compute them as follows:
> >>
> >>
> >>
> >> ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
> >>
> >> trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
> >>
> >> group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
> >>
> >> weight <- c(ctl, trt)
> >>
> >> lm.D9 <- lm(weight ~ group)
> >>
> >> X <- model.matrix(lm.D9)
> >>
> >>
> >>
> >> ### Compute Standard errors of fixed effects
> >>
> >> sqrt(diag(solve(crossprod(X)) * .6964^2))
> >>
> >>
> >>
> >> ### use built in extractor function to get them instead
> >>
> >> sqrt(diag(vcov(lm.D9)))
> >>
> >>
> >>
> >> Similarly, get the standard errors of the mixed model from its own
> >> variance/covariance matrix.
> >>
> >>
> >>
> >>
> >>
> >> *From:* Simon Harmel <sim.harmel using gmail.com>
> >> *Sent:* Wednesday, September 16, 2020 5:54 PM
> >> *To:* Harold Doran <harold.doran using cambiumassessment.com>
> >> *Cc:* r-sig-mixed-models <r-sig-mixed-models using r-project.org>
> >> *Subject:* Re: [R-sig-ME] Standard Error of a coef. in a 2-level model
> >> vs. 2 OLS models
> >>
> >>
> >>
> >> Hi Harold,
> >>
> >>
> >>
> >> I improved my question, and asked it on CrossValidated: (
> >>
> https://stats.stackexchange.com/questions/487363/using-variance-components-of-a-mixed-model-to-obtain-std-error-of-a-coef-from-a
> >> )
> >>
> >>
> >>
> >> I would appreciate your answer, either here or on CrossValidated.
> >>
> >>
> >>
> >> Many thanks, Simon
> >>
> >>
> >>
> >> On Wed, Sep 16, 2020 at 4:45 PM Harold Doran <
> >> harold.doran using cambiumassessment.com> wrote:
> >>
> >> This is not how standard errors are computed for linear or mixed linear
> >> models. I'm not sure what you're goal is, but the SEs are the square
> roots
> >> of the diagonal elements of the variance/covariance matrix of the fixed
> >> effects.
> >>
> >> See ?vcov on how to extract that matrix.
> >>
> >> -----Original Message-----
> >> From: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> On
> >> Behalf Of Simon Harmel
> >> Sent: Sunday, September 13, 2020 7:51 PM
> >> To: r-sig-mixed-models <r-sig-mixed-models using r-project.org>
> >> Subject: Re: [R-sig-ME] Standard Error of a coef. in a 2-level model
> vs. 2
> >> OLS models
> >>
> >> External email alert: Be wary of links & attachments.
> >>
> >>
> >> Just a clarification.
> >>
> >> For `ols1` model, I can approximate its SE of the sector coefficient by
> >> using the within and between variance components from the HLM model:
> >> sqrt(( 6.68  + 39.15  )/45)/(160*.25))
> >>
> >> BUT  For `ols2` model, how can I approximate its SE of the sector
> >> coefficient by using the within and between variance components from the
> >> HLM model?
> >>
> >> On Sun, Sep 13, 2020 at 6:37 PM Simon Harmel <sim.harmel using gmail.com>
> wrote:
> >>
> >>> Dear All,
> >>>
> >>> I have fit two ols models (ols1 & ols2) and an mixed-effects model
> (m1).
> >>> ols1 is a simple lm() model that ignores the second-level. ols2 is a
> >>> simple
> >>> lm() model that ignores the first-level.
> >>>
> >>> For `ols1` model,  `sigma(ols1)^2` almost equals sum of variance
> >>> components in the `m1` model: 6.68 (bet.) + 39.15 (with.) For `ols2`
> >>> model, I wonder what does `sigma(ols2)^2` represents when compared to
> >>> the `m1` model?
> >>>
> >>> Here is the fully reproducible code:
> >>>
> >>> library(lme4)
> >>> library(tidyverse)
> >>>
> >>> hsb <- read.csv('
> >>> https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
> >>> hsb_ave <- hsb %>% group_by(sch.id) %>% mutate(math_ave = mean(math))
> >>> %>%
> >>> slice(1) # data that only considers grouping but ignores lower level
> >>>
> >>> ols1 <- lm(math ~ sector, data = hsb)
> >>> summary(ols1)
> >>>
> >>> m1 <- lmer(math ~ sector + (1|sch.id), data = hsb)
> >>> summary(m1)
> >>>
> >>> # `sigma(ols1)^2` almost equals 6.68 (bet.) + 39.15 (with.) from lmer
> >>>
> >>> But if I fit another ols model that only considers the grouping
> >>> structure (ignoring lower level):
> >>>
> >>> ols2 <- lm(math_ave ~ sector, data = hsb_ave)
> >>> summary(ols2)
> >>>
> >>> Then what does `sigma(ols2)^2` should amount to when compared to the
> >>> `m1` model?
> >>>
> >>         [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-mixed-models using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> >>
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]