[R-sig-ME] "nesting for free" in lme4. Do I have this right?

Fri Apr 6 10:08:34 CEST 2018

Dear Jake,

IMHO you switched the implicit and explicit nesting. (1|region/state)
is explicit because you tell the model that those should be nested.
Implicit nesting occurs when you add the random effect as crossed
(1|region) + (1|state) but since each state has a unique ID and each
state occurs in only one region, the result is identical to the nested
random effects. Hence the implicit nesting.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

2018-04-05 23:07 GMT+02:00 Jake Westfall <jake.a.westfall at gmail.com>:
> Hi Paul,
>
> The "(1|state) + (1|region)" and "(1|region/state)" models are equivalent
> if states are explicitly nested (as opposed to implicitly nested) in
> regions. Explicitly nested means that the states in each region are labeled
> with unique IDs that don't repeat across regions. This equivalence is
> because (1|region/state) is just internally expanded to (1|region) +
> (1|region:state), but the "region:state" interaction term has the same
> number of unique levels as a simple "state" term if all states have unique
> IDs.
>
> The model with just "(1|state)" is not equivalent to either of the
> aforementioned models. It has fewer parameters and it lumps some of the
> region variance in with the state variance.
>
> As for identifiability, because you have many observations within each
> state, a model with random effects for both state and region is definitely
> identifiable, there shouldn't be any technical or conceptual problem. And
> it's probably a better model since it appears there is at least non-trivial
> region variance in your data and the model converges fine.
>
> Jake
>
> On Thu, Apr 5, 2018 at 3:51 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
>
>> I don't (yet) have permission to give you this data, but I've asked
>> for permission and hope I can show in future if you are interested.
>>
>> One of the students showed me a model and I said "oh, no, that won't
>> work" but glmer calculated estimates. He ran a model for US public
>> opinion with "(1|state) + (1|region)" and I thought it should be "(1 |
>> region/state)", but it turns out results are the same. This is what I
>> mean "nesting for free" in lme4.
>>
>> This author's field of study uses lme4 to calculate shrinkage
>> estimates for states, gender/ethnic subgroups. Things that I would
>> treat as individual level fixed effects, like gender, age, and
>> education, are inserted as random effects. Then they use the
>> conditional modes for constructing other quantities they want.
>>
>> After the student ran his model, I suggested instead: "(1 | state)".
>> This also converges, but the state-by-state effect estimates differ
>> from the nested model. I thought that was interesting.  The main
>> purpose here is to retrieve the conditional modes, NOT the variance
>> estimates.
>>
>> Which model is better/correct?
>>
>> Nested
>>
>> Generalized linear mixed model fit by maximum likelihood (Laplace
>>   Approximation) [glmerMod]
>>  Family: binomial  ( logit )
>> Formula: y ~ (1 | regionf/state)
>> Control: glmerControl(optimizer = "Nelder_Mead")
>>
>>      AIC      BIC   logLik deviance df.resid
>>  18888.9  18911.5  -9441.4  18882.9    13875
>>
>> Scaled residuals:
>>     Min      1Q  Median      3Q     Max
>> -1.0856 -0.8567 -0.7726  1.1097  1.4309
>>
>> Random effects:
>>  Groups        Name        Variance Std.Dev.
>>  state:regionf (Intercept) 0.05461  0.2337
>>  regionf       (Intercept) 0.01226  0.1107
>> Number of obs: 13878, groups:  state:regionf, 50; regionf, 4
>>
>> Fixed effects:
>>             Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -0.26732    0.06965  -3.838 0.000124 ***
>> ---
>>
>>
>> Here are region and first few state conditional modes from the nested
>> model:
>>
>> $state
>>                 (Intercept)
>> Alabama        -0.188083365
>> Alaska         -0.103191227
>> Arizona         0.049081765
>> Arkansas       -0.108260203
>> California      0.436002015
>> Colorado        0.035086075
>> Connecticut     0.046271020
>> Delaware        0.080796357
>> Florida         0.062439432
>> ...
>>
>> $regionf
>>            (Intercept)
>> Northeast  0.139654257
>> Midwest   -0.002894665
>> South     -0.106236023
>> West      -0.027772517
>>
>> For reference, the first few random effects from (1 | state) are:
>>
>> > ranef(individual.modeld)
>> $state
>>                 (Intercept)
>> Alabama        -0.260553538
>> Alaska         -0.116579160
>> Arizona         0.046677297
>> Arkansas       -0.172353191
>> California      0.435397028
>> Colorado        0.032704790
>> Connecticut     0.162169289
>> Delaware        0.057368626
>> Florida        -0.013638881
>> ....
>>
>> The "total state effect" for, say, Alabama is the sum of the South
>> effect and the Alabama effect,
>> and that's different from the (1 | state) model.
>>
>> Alabama nested: -0.106236023 -0.188083365 = -0.2943194
>>
>> Alabama from (1 | state): -0.26
>>
>> Those are slightly different. Is one better?
>>
>> Seems to me the difference between the two is simply the shrinkage
>> effect of the PLS estimator, there's no substantively important
>> meaning in it.  The shrinkage is applied separately to (1 | region)
>> and (1 | state). The correlation between region and state effects is
>> assumed 0.  It is manufacturing 2 random effect layers where there
>> should be just one.
>>
>> If state is treated as a fixed effect, of course, the region variable
>> is unidentifiable. Should the random effect model really be allowed to
>> manufacture a difference?  That makes me think (1 | state) is the
>> "right model" and "(1 | region/state)" is different only as an
>> artifact.
>>
>> Likelihood ratio test seems to say there is no big difference, which
>> is a relief.  But what if there were?
>>
>> individual.modeld: y ~ (1 | state)
>> individual.modelc: y ~ (1 | regionf/state)
>>                   Df   AIC   BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
>> individual.modeld  2 18889 18904 -9442.6    18885
>> individual.modelc  3 18889 18912 -9441.4    18883 2.2477      1     0.1338
>>
>>
>> More and more, I have the feeling that the random effects model
>> fabricates an ability to estimate quantities that are not really
>> indentifiable.  Estimating state-level-random effects on top of
>> randomly assigned regional scores is only possible because of the
>> technicalities related to shrinkage. Maybe. I'm not sure, that's why
>> I'm asking you what you think.
>>
>> pj
>> --
>> Paul E. Johnson   http://pj.freefaculty.org
>> Director, Center for Research Methods and Data Analysis
>> http://crmda.ku.edu
>>
>> To write to me directly, please address me at pauljohn at ku.edu.
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models