[R-sig-ME] Why lme4 doesn't throw an error for illegitimate an model
Ben Bolker
bbo|ker @end|ng |rom gm@||@com
Thu Oct 8 03:33:25 CEST 2020
Some more comments.
* I continue to think that you're mildly abusing the system by
cross-posting very closely related questions on CrossValidated and here
(e.g.
https://stats.stackexchange.com/questions/+/visualizing-the-folly-of-fitting-random-slopes-for-variables-that-dont-vary-wit)
* although the results would be identical if sector were a categorical
rather than a numeric variable, the fact that it's a 0/1 variable makes
it clearer what's actually happening (and that the result may not be
what you want but is not "nonsensical".
Take the example (sector|sch.id). The model matrix for the random
effects (Z) is formed by taking the _Khatri-Rao product_ of the model
matrix of the terms (~1+sector) with the indicator matrix for the
grouping variable. What this means in practice is that if
b_(0,j(i)) is the random intercept term for the school (j) associated
with the i^th observation
b_(1,j(i)) is the random slope term for school j(i)
and
sector(j(i)) is the sector associated with school j(i)
this means that the random effect term for the i^th observation will be
b_(0,j(i)) + b_(1,j(i))*sector(j(i))
which means that for sector==0 this will be
b_(0,j(i))
and for sector==1 this will be
b_(0,j(i)) + b_(1,j(i))
this will therefore translate to modeling a higher variance (or at
least, not lower) among schools with sector==1.
So in my opinion this does make sense, even if it's not what you had
in mind.
* more generally, I _do_ think it would be good to have a system that
kept track where appropriate of 'levels' (recognizing that they're not
always appropriate); however, in the absence of that kind of metadata,
it's very hard to detect simply from the structure of the model whether
a model is 'misspecified'.
* this is one reason that Richard McElreath promotes building models
more 'from scratch' in his _Rethinking Statistics_ book; if you have to
build the model components more explicitly yourself it's a nuisance, but
it reduces the chance that you'll get confused by the meaning of a model
specification.
On 10/7/20 7:29 PM, Phillip Alday wrote:
> Potentially something that is uninteresting or nonsensical or not
> physically real ... the math for a Klein bottle is well defined, but I
> doubt you'll find one in the three spatial dimensions we experience. ;)
>
>
> Here it is actually capturing some aspects of the different sectors:
>
>> summary(coef(lmer(math ~ (sector | sch.id), data =
> hsb))$sch.id[,"sector"])
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> -2.0087 -0.1117 0.4981 0.4908 1.0710 2.8777
>
>> summary(lmer(math ~ sector + (1|sch.id), data=hsb))
> Linear mixed model fit by REML ['lmerMod']
> Formula: math ~ sector + (1 | sch.id)
> Data: hsb
>
> REML criterion at convergence: 47080.1
>
> Scaled residuals:
> Min 1Q Median 3Q Max
> -3.0130 -0.7523 0.0253 0.7602 2.7472
>
> Random effects:
> Groups Name Variance Std.Dev.
> sch.id (Intercept) 6.677 2.584
> Residual 39.151 6.257
> Number of obs: 7185, groups: sch.id, 160
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept) 11.3930 0.2928 38.907
> sector 2.8049 0.4391 6.388
>
>
> Basically, it's the (shrunken) sector by-school adjustment
> from-the-zeroed-out-fixed/population-effect . Since you've omitted
> sector from the fixed effects, then the adjustments correspond to
> (shrunken) estimates (well, predictions, see previous fine print) of
> what the total population + sch.id-level effect would be. Because the
> sch.id-level effect is in reality ideally/approximately(*) zero, these
> work out to be approximations to the population level effects.
>
> (*) "ideally/approximately" because the variability between schools may
> differ between sectors. In other words, the public schools may have
> adjustments distributed as N(public_mean, public_sd) and the private
> schools may have adjustments distributed as N(private_mean, private_sd),
> where the two SDs aren't equal. The means are handled by the fixed
> effects, the SDs by the random effects. This is what you get from this
> model:
>
>> summary(lmer(math ~ sector + (1+sector|sch.id), data=hsb))
> Linear mixed model fit by REML ['lmerMod']
> Formula: math ~ sector + (1 + sector | sch.id)
> Data: hsb
>
> REML criterion at convergence: 47080.1
>
> Scaled residuals:
> Min 1Q Median 3Q Max
> -3.01392 -0.75219 0.02518 0.76045 2.74806
>
> Random effects:
> Groups Name Variance Std.Dev. Corr
> sch.id (Intercept) 6.7346 2.5951
> sector 0.5322 0.7295 -0.17
> Residual 39.1513 6.2571
> Number of obs: 7185, groups: sch.id, 160
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept) 11.3930 0.2939 38.762
> sector 2.8048 0.4387 6.394
>
>> summary(coef(lmer(math ~ sector + (sector | sch.id), data =
> hsb))$sch.id[,"sector"])
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 2.531 2.741 2.806 2.805 2.860 3.104
>
> I'll end with a question which I won't answer but it will help you to
> think about why fitting these models might be useful: how is this
> related to heteroskedacity?
>
> Phillip
>
> On 7/10/20 11:59 pm, Simon Harmel wrote:
>> Also Phillip, what does `sector` in the output of `VarCorr(mn)` below
>> denote, now that you say this model is mathematically defined?
>>
>> mn <- lmer(math ~ ses + (sector | sch.id), data = hsb)
>>
>>> VarCorr(mn)
>> Groups Name Std.Dev. Corr
>> sch.id <http://sch.id> (Intercept) 2.0256
>> sector 1.3717 -0.071
>> Residual 6.0858
>>
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list