[R-sig-ME] Collinearity diagnostics for (mixed) multinomial models

Fri Feb 25 09:23:25 CET 2022

Dear John (and anyone else qualified to comment),

I fit lots of mixed-effects multinomial models in my research, and I would
like to see some (multi)collinearity diagnostics on the fixed effects, of
which there are over 30. My models are fit using the Bayesian *brms*
package because I know of no frequentist packages with multinomial GLMM
compatibility.

With continuous or dichotomous outcomes, my go-to function for calculating
multicollinearity diagnostics is of course *vif()* from the *car* package.
As expected, however, this function does not report sensible diagnostics
for multinomial models -- not even for standard ones fit by the *nnet*
package's *multinom()* function. The reason, I presume, is because a
multinomial model is not really one but C-1 regression models  (where C is
the number of response categories) and the *vif()* function is not designed
to deal with this scenario.

Therefore, in order to obtain meaningful collinearity metrics, my present
plan is to write a simple helper function that uses *vif() *to calculate
and present (generalized) variance inflation metrics for the C-1
sub-datasets to which the C-1 component binomial models of the overall
multinomial model are fit. In other words, it will partition the data into
those C-1 subsets, and then apply *vif()* to as many linear regressions
using a made-up continuous response and the fixed effects of interest.

Does this seem like a sensible approach?

Best,

Juho

ma 27. syysk. 2021 klo 19.26 John Fox (jfox using mcmaster.ca) kirjoitti:

> Dear Simon,
>
> I believe that Russ's point is that the fact that the additive model
> allows you to estimate nonsensical quantities like a mean for girls in
> all-boys' schools implies a problem with the model. Why not do as I
> suggested and define two dichotomous factors: sex of student
> (male/female) and type of school (coed, same-sex)? The four combinations
> of levels then make sense.
>
> Best,
>   John
>
> On 2021-09-27 12:09 p.m., Simon Harmel wrote:
> > Thanks, Russ! There is one thing that I still don't understand. We
> > have two completely empty cells (boys in girl-only & girls in boy-only
> > schools). Then, how are the means of those empty cells computed (what
> > data is used in their place in the additive model)?
> >
> > Let's' simplify the model for clarity:
> >
> > library(R2MLwiN)
> > library(emmeans)
> >
> > Form3 <- normexam ~ schgend + sex ## + standlrt + (standlrt | school)
> > model3 <- lm(Form3, data = tutorial)
> >
> > emmeans(model3, pairwise~sex+schgend)$emmeans
> >
> >   sex  schgend   emmean     SE   df lower.CL upper.CL
> >   boy  mixedsch -0.2160 0.0297 4055  -0.2742 -0.15780
> >   girl mixedsch  0.0248 0.0304 4055  -0.0348  0.08437
> >   boy  boysch    0.0234 0.0437 4055  -0.0623  0.10897
> >   girl boysch    0.2641 0.0609 4055   0.1447  0.38360<-how computed?
> >   boy  girlsch  -0.0948 0.0502 4055  -0.1931  0.00358<-how computed?
> >   girl girlsch   0.1460 0.0267 4055   0.0938  0.19829
> >
> >
> >
> >
> >
> > On Sun, Sep 26, 2021 at 8:22 PM Lenth, Russell V
> > <russell-lenth using uiowa.edu> wrote:
> >>
> >> By the way, returning to the topic of interpreting coefficients, you
> ought to have fun with the ones from the model I just fitted:
> >>
> >> Fixed effects:
> >>                 Estimate Std. Error t value
> >> (Intercept)    -0.18882    0.05135  -3.677
> >> standlrt        0.55442    0.01994  27.807
> >> schgendboysch   0.17986    0.09915   1.814
> >> schgendgirlsch  0.17482    0.07877   2.219
> >> sexgirl         0.16826    0.03382   4.975
> >>
> >> One curious thing you'll notice is that there are no coefficients for
> the interaction terms. Why? Because those terms were "thrown out" of the
> model, and so they are not shown. I think it is unwise to not show what was
> thrown out (e.g., lm would have shown them as NAs), because in fact what we
> see is but one of infinitely many possible solutions to the regression
> equations. This is the solution where the last two coefficients are
> constrained to zero. There is another equally reasonable one where the
> coefficients for schgendboysch and schgendgirlsch  are constrained to zero,
> and the two interaction effects would then be non-zero. And infinitely more
> where all 7 coefficients are non-zero, and there are two linear constraints
> among them.
> >>
> >> Of course, since the particular estimate shown consists of all the main
> effects and interactions are constrained to zero, it does demonstrate that
> the additive model *could* have been used to obtain the same estimates and
> standard errors, and you can see that by comparing the results (and
> ignoring the invalid ones from the additive model). But it is just a lucky
> coincidence that it worked out this way, and the additive model did lead us
> down a primrose path containing silly results among the correct ones.
> >>
> >> Russ
> >>
> >> -----Original Message-----
> >> From: Lenth, Russell V
> >> Sent: Sunday, September 26, 2021 7:43 PM
> >> To: Simon Harmel <sim.harmel using gmail.com>
> >> Cc: r-sig-mixed-models using r-project.org
> >> Subject: RE: [External] Re: [R-sig-ME] Help with interpreting one
> fixed-effect coefficient
> >>
> >> I guess correctness is in the eyes of the beholder. But I think this
> illustrates the folly of the additive model. Having additive effects
> suggests a belief that you can vary one factor more or less independently
> of the other. In his comments, John Fox makes a good point that escaped my
> earlier cursory view of the original question, that you don't have data on
> girls attending all-boys' schools, nor boys attending all-girls' schools;
> yet the model that was fitted estimates a mean response for both those
> situations. That's a pretty clear testament to the failure of that model –
> and also why the coefficients don't make sense. And finally why we have
> estimates of 15 comparisons (some of which are aliased with one another),
> when only 6 of them make sense.
> >>
> >> If instead, a model with interaction were fitted, it would be a
> rank-deficient model because two cells are empty. Perhaps there is some
> sort of nesting structure that could be used to work around that. However,
> it doesn't matter much because emmeans assesses estimability, and the two
> combinations I mentioned above would be flagged as non-estimable. One could
> then more judiciously use the contrast function to test meaningful
> contrasts across this irregular array of cell means. Or even injudiciously
> asking for all pairwise comparisons, you will see 6 estimable ones and 9
> non-estimable ones. See output below.
> >>
> >> Russ
> >>
> >> ----- Interactive model -----
> >>
> >>> Form <- normexam ~ 1 + standlrt + schgend * sex + (standlrt | school)
> >>> model <- lmer(Form, data = tutorial, REML = FALSE)
> >> fixed-effect model matrix is rank deficient so dropping 2 columns /
> coefficients
> >>>
> >>> emmeans(model, pairwise~schgend+sex)
> >>
> >> ... messages deleted ...
> >>
> >> $emmeans
> >>   schgend  sex    emmean     SE  df asymp.LCL asymp.UCL
> >>   mixedsch boy  -0.18781 0.0514 Inf   -0.2885   -0.0871
> >>   boysch   boy  -0.00795 0.0880 Inf   -0.1805    0.1646
> >>   girlsch  boy    nonEst     NA  NA        NA        NA
> >>   mixedsch girl -0.01955 0.0521 Inf   -0.1216    0.0825
> >>   boysch   girl   nonEst     NA  NA        NA        NA
> >>   girlsch  girl  0.15527 0.0632 Inf    0.0313    0.2792
> >>
> >> Degrees-of-freedom method: asymptotic
> >> Confidence level used: 0.95
> >>
> >> $contrasts
> >>   contrast                     estimate     SE  df z.ratio p.value
> >>   mixedsch boy - boysch boy     -0.1799 0.0991 Inf  -1.814  0.4565
> >>   mixedsch boy - girlsch boy     nonEst     NA  NA      NA      NA
> >>   mixedsch boy - mixedsch girl  -0.1683 0.0338 Inf  -4.975  <.0001
> >>   mixedsch boy - boysch girl     nonEst     NA  NA      NA      NA
> >>   mixedsch boy - girlsch girl   -0.3431 0.0780 Inf  -4.396  0.0002
> >>   boysch boy - girlsch boy       nonEst     NA  NA      NA      NA
> >>   boysch boy - mixedsch girl     0.0116 0.0997 Inf   0.116  1.0000
> >>   boysch boy - boysch girl       nonEst     NA  NA      NA      NA
> >>   boysch boy - girlsch girl     -0.1632 0.1058 Inf  -1.543  0.6361
> >>   girlsch boy - mixedsch girl    nonEst     NA  NA      NA      NA
> >>   girlsch boy - boysch girl      nonEst     NA  NA      NA      NA
> >>   girlsch boy - girlsch girl     nonEst     NA  NA      NA      NA
> >>   mixedsch girl - boysch girl    nonEst     NA  NA      NA      NA
> >>   mixedsch girl - girlsch girl  -0.1748 0.0788 Inf  -2.219  0.2287
> >>   boysch girl - girlsch girl     nonEst     NA  NA      NA      NA
> >>
> >> Degrees-of-freedom method: asymptotic
> >> P value adjustment: tukey method for comparing a family of 6 estimates
> >>
> >>
> >> ---------------------------------------------------------
> >> From: Simon Harmel <sim.harmel using gmail.com>
> >> Sent: Sunday, September 26, 2021 3:08 PM
> >> To: Lenth, Russell V <russell-lenth using uiowa.edu>
> >> Cc: r-sig-mixed-models using r-project.org
> >> Subject: [External] Re: [R-sig-ME] Help with interpreting one
> fixed-effect coefficient
> >>
> >> Dear Russ and the List Members,
> >>
> >> If we use Russ' great package (emmeans), we see that although
> meaningless, but "schgendgirl-only" can be interpreted using the logic I
> mentioned here:
> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2021q3/029723.html .
> >>
> >> That is, "schgendgirl-only" can meaninglessly mean: ***diff. bet. boys
> in girl-only vs. mixed schools*** just like it can meaningfully mean:
> ***diff. bet. girls in girl-only vs. mixed schools***
> >>
> >> Russ, have I used emmeans correctly?
> >>
> >> Simon
> >>
> >> Here is a reproducible code:
> >>
> >> library(R2MLwiN) # For the dataset
> >> library(lme4)
> >> library(emmeans)
> >>
> >> data("tutorial")
> >>
> >> Form <- normexam ~ 1 + standlrt + schgend + sex + (standlrt | school)
> >> model <- lmer(Form, data = tutorial, REML = FALSE)
> >>
> >> emmeans(model, pairwise~schgend+sex)$contrast
> >>
> >> contrast                     estimate     SE  df z.ratio p.value
> >> mixedsch boy - boysch boy    -0.17986 0.0991 Inf -1.814  0.4565
> >> mixedsch boy - girlsch boy   -0.17482 0.0788 Inf -2.219  0.2287
>  <--This coef. equals
> >> mixedsch boy - mixedsch girl -0.16826 0.0338 Inf -4.975  <.0001
> >> mixedsch boy - boysch girl   -0.34813 0.1096 Inf -3.178  0.0186
> >> mixedsch boy - girlsch girl  -0.34308 0.0780 Inf -4.396  0.0002
> >> boysch boy - girlsch boy      0.00505 0.1110 Inf  0.045  1.0000
> >> boysch boy - mixedsch girl    0.01160 0.0997 Inf  0.116  1.0000
> >> boysch boy - boysch girl     -0.16826 0.0338 Inf -4.975  <.0001
> >> boysch boy - girlsch girl    -0.16322 0.1058 Inf -1.543  0.6361
> >> girlsch boy - mixedsch girl   0.00656 0.0928 Inf  0.071  1.0000
> >> girlsch boy - boysch girl    -0.17331 0.1255 Inf -1.381  0.7388
> >> girlsch boy - girlsch girl   -0.16826 0.0338 Inf -4.975  <.0001
> >> mixedsch girl - boysch girl  -0.17986 0.0991 Inf -1.814  0.4565
> >> mixedsch girl - girlsch girl -0.17482 0.0788 Inf -2.219  0.2287
>  <--This coef.
> >> boysch girl - girlsch girl    0.00505 0.1110 Inf  0.045  1.0000
> >>
> >>
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> --
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://socialsciences.mcmaster.ca/jfox/
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]