[R-sig-ME] Why lme4 doesn't throw an error for illegitimate an model

Wed Oct 7 23:59:27 CEST 2020

Also Phillip, what does `sector` in the output of `VarCorr(mn)` below
denote, now that you say this model is mathematically defined?

mn <- lmer(math ~ ses +  (sector | sch.id), data = hsb)

> VarCorr(mn)
 Groups   Name        Std.Dev. Corr
 sch.id   (Intercept) 2.0256
          sector      1.3717   -0.071
 Residual             6.0858

On Wed, Oct 7, 2020 at 4:21 PM Simon Harmel <sim.harmel using gmail.com> wrote:

> Thank you Phillip. The data structure is exactly the way you understood
> it. Another nonsensical model without warning would be m2. Where `ses` is a
> predictor that varies both within and across `sch.id` (clusters), and
> `sector` is a binary variable that only varies across the `sch.id`
> (clusters).
>
> The cross-level interaction is set by the software syntax to AGAIN vary
> across the levels of a grouping variable. But the model runs without
> warning.
>
> hsb <- read.csv('
> https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
> m2 <- lmer(math ~ ses+sector + (ses:sector | sch.id), data = hsb)
>
> On Wed, Oct 7, 2020 at 3:58 PM Phillip Alday <phillip.alday using mpi.nl> wrote:
>
>> Without knowing the structure of the data, it's hard to answer the
>> question ... there are ways for me explore and find out what the
>> nesting/grouping structure is, but that takes time I don't have. So
>> here's a quick attempt just using general principles.
>>
>> I think the first trick is to stop thinking of thinking as an ith-level
>> predictor. For one thing, the levels don't have to be strictly nested.
>> Personally, I find the i-th level terminology confusing because I can
>> never remember which direction we're counting from.
>>
>> Dropping the lm/ler calls for convenience in typing:
>>
>> math ~ 1 + ses
>>
>> this estimates a model with a slope for ses and an intercept.
>>
>> math ~ 1 + ses + (1|sch.id)
>>
>> this estimates a model with a slope for ses and an intercept and an
>> offset for each sch.id. (The estimated offsets are technically
>> "predictions", because the variance of the offsets is what's being
>> estimated, but we'll leave that be for now.)
>>
>> An offset to what? Well the population-level/fixed effect corresponding
>> to the intercept. So you can think of this random effect as providing an
>> adjustment to the intercept for each sch.id
>>
>>
>> math ~ 1 + ses + (1+ses|sch.id)
>>
>> this estimates a model with a slope for ses and an intercept; as well as
>> per-sch.id adjustments for the intercept and ses.
>>
>> Note by the way that the model
>>
>> math ~ 1
>>
>> is a special case of
>>
>> math ~ 1 + ses
>>
>> with the slope of ses set / assumed to be zero.
>>
>> This will help in the next step.
>>
>> math ~ 1 + (1+ses|sch.id)
>>
>> this estimates a model with a slope an intercept; as well as per-sch.id
>> adjustments for the intercept and ses. But where is ses in the fixed
>> effects? Well, you can think of it as being zero (see previous point),
>> so the adjustments will be from the assumed slope of zero, instead of
>> the estimated slope.
>>
>> This brings us to
>>
>> math ~ 1 + ses + (1 + sector | sch.id)
>>
>> this estimates a model with a slope an intercept; as well as per-sch.id
>> adjustments for the intercept and sector.
>>
>> This is mathematically well-defined, even if there is only one value of
>> sector observed for each sch.id (which is the case when sch.id is nested
>> within sector), because the shrinkage of the random effects deals with
>> the rank deficiency. (If you didn't understand that sentence: it's still
>> possible to estimate these quantities.) In the case that sector doesn't
>> vary within sch.id, you'll get an estimate that reflects this: either it
>> will be perfectly correlated with the intercept or shrunk to zero. In
>> other words, that's one way to get a singular/boundary fit.
>>
>> Now mathematically well-defined doesn't mean that it makes sense
>> inferentially. And that's where it's incumbent upon the user to think
>> about their inferential question, their data, and their phrasing of the
>> inferential question as a statistical model.
>>
>> Phillip
>>
>> On 7/10/20 9:30 pm, Simon Harmel wrote:
>> > correction:
>> >
>> > As far as I understand, in a 2-level model, the use of a level-2
>> predictor
>> > (here "sector") in the ***random part*** is illegitimate.
>> >
>> > But I wonder why in the following model `lmer` doesn't throw an error to
>> > indicate that? What does the random part estimates in the output?
>> >
>> > library(lme4)
>> > hsb <- read.csv('
>> > https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
>> >
>> > mn <- lmer(math ~ ses + (sector | sch.id), data = hsb)
>> >
>> >
>> >
>> > On Wed, Oct 7, 2020 at 2:24 PM Simon Harmel <sim.harmel using gmail.com>
>> wrote:
>> >
>> >> Dear All,
>> >>
>> >> As far as I understand, in a 2-level model, the use of a level-2
>> predictor
>> >> (here "sector") is illegitimate.
>> >>
>> >> But I wonder why in the following model `lmer` doesn't throw an error
>> to
>> >> indicate that?
>> >>
>> >> library(lme4)
>> >> hsb <- read.csv('
>> >> https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
>> >>
>> >> mn <- lmer(math ~ ses + (sector | sch.id), data = hsb)
>> >>
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > R-sig-mixed-models using r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >
>>
>

	[[alternative HTML version deleted]]