[R-sig-ME] Collinearity diagnostics for (mixed) multinomial models

Fri Feb 25 14:49:49 CET 2022

I am indeed talking about collinearity of the predictors, not the response.
A multinomial model consists of C-1 binary submodels, so it arguably
doesn't make sense to measure collinearity in the entire dataset at once
but, rather, it should be measured separately in the C-1 subdatasets to
which the C-1 submodels are fit. My question is whether the way I propose
to do this (in the original post) is sensible.

Best,

Juho

pe 25. helmik. 2022 klo 15.19 Sorkin, John (jsorkin using som.umaryland.edu)
kirjoitti:

> I would agree with Steven. Collinearity is problem with the predictor
> variables, not the outcome variable. Given a multinomial model y = f(x1,
> x2, x3, . . . xn), one could run a simple linear regression x1 = f(x2,x3, .
> . .,xn) and look at vif to determine if x2 . . . xn are colinear and
> perhaps an additional regression x2=f(x1,x3, . . .xn) to determine if x1,
> x3, . . . xn are colinear. If I am missing something, I hope someone will
> correct me.
>
> John (but not John Fox)
>
>
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows
>
>
>
> *From: *stevedrd--- via R-sig-mixed-models
> <r-sig-mixed-models using r-project.org>
> *Sent: *Friday, February 25, 2022 8:07 AM
> *To: *John Fox <jfox using mcmaster.ca>; Juho Kristian Ruohonen
> <juho.kristian.ruohonen using gmail.com>
> *Cc: *r-sig-mixed-models using r-project.org
> *Subject: *Re: [R-sig-ME] Collinearity diagnostics for (mixed)
> multinomial models
>
>
>
> This seems odd to me, but then I don't usually analyze multinomial
> models.  Is there an issue with collinearity in the response variable in a
> multinomial model?  I would think that the levels are collinear by
> definition.  So then the issue, it seems to me, is whether there is
> collinearity in the fixed effects - and that should be independent of the
> response variables.  Could you use the vif() function with a standard
> response (say = 1) to check collinearity in the fixed effects?  I would
> think that your method on the sub datasets may not capture all of the
> collinearity in the full model.
> But I could be waaaaaaay off base on this.
> SteveDenham
>     On Friday, February 25, 2022, 03:24:15 AM EST, Juho Kristian Ruohonen <
> juho.kristian.ruohonen using gmail.com> wrote:
>
>  Dear John (and anyone else qualified to comment),
>
> I fit lots of mixed-effects multinomial models in my research, and I would
> like to see some (multi)collinearity diagnostics on the fixed effects, of
> which there are over 30. My models are fit using the Bayesian *brms*
> package because I know of no frequentist packages with multinomial GLMM
> compatibility.
>
> With continuous or dichotomous outcomes, my go-to function for calculating
> multicollinearity diagnostics is of course *vif()* from the *car* package.
> As expected, however, this function does not report sensible diagnostics
> for multinomial models -- not even for standard ones fit by the *nnet*
> package's *multinom()* function. The reason, I presume, is because a
> multinomial model is not really one but C-1 regression models  (where C is
> the number of response categories) and the *vif()* function is not designed
> to deal with this scenario.
>
> Therefore, in order to obtain meaningful collinearity metrics, my present
> plan is to write a simple helper function that uses *vif() *to calculate
> and present (generalized) variance inflation metrics for the C-1
> sub-datasets to which the C-1 component binomial models of the overall
> multinomial model are fit. In other words, it will partition the data into
> those C-1 subsets, and then apply *vif()* to as many linear regressions
> using a made-up continuous response and the fixed effects of interest.
>
> Does this seem like a sensible approach?
>
> Best,
>
> Juho
>
>
>
>
> ma 27. syysk. 2021 klo 19.26 John Fox (jfox using mcmaster.ca) kirjoitti:
>
> > Dear Simon,
> >
> > I believe that Russ's point is that the fact that the additive model
> > allows you to estimate nonsensical quantities like a mean for girls in
> > all-boys' schools implies a problem with the model. Why not do as I
> > suggested and define two dichotomous factors: sex of student
> > (male/female) and type of school (coed, same-sex)? The four combinations
> > of levels then make sense.
> >
> > Best,
> >  John
> >
> > On 2021-09-27 12:09 p.m., Simon Harmel wrote:
> > > Thanks, Russ! There is one thing that I still don't understand. We
> > > have two completely empty cells (boys in girl-only & girls in boy-only
> > > schools). Then, how are the means of those empty cells computed (what
> > > data is used in their place in the additive model)?
> > >
> > > Let's' simplify the model for clarity:
> > >
> > > library(R2MLwiN)
> > > library(emmeans)
> > >
> > > Form3 <- normexam ~ schgend + sex ## + standlrt + (standlrt | school)
> > > model3 <- lm(Form3, data = tutorial)
> > >
> > > emmeans(model3, pairwise~sex+schgend)$emmeans
> > >
> > >  sex  schgend  emmean    SE  df lower.CL upper.CL
> > >  boy  mixedsch -0.2160 0.0297 4055  -0.2742 -0.15780
> > >  girl mixedsch  0.0248 0.0304 4055  -0.0348  0.08437
> > >  boy  boysch    0.0234 0.0437 4055  -0.0623  0.10897
> > >  girl boysch    0.2641 0.0609 4055  0.1447  0.38360<-how computed?
> > >  boy  girlsch  -0.0948 0.0502 4055  -0.1931  0.00358<-how computed?
> > >  girl girlsch  0.1460 0.0267 4055  0.0938  0.19829
> > >
> > >
> > >
> > >
> > >
> > > On Sun, Sep 26, 2021 at 8:22 PM Lenth, Russell V
> > > <russell-lenth using uiowa.edu> wrote:
> > >>
> > >> By the way, returning to the topic of interpreting coefficients, you
> > ought to have fun with the ones from the model I just fitted:
> > >>
> > >> Fixed effects:
> > >>                Estimate Std. Error t value
> > >> (Intercept)    -0.18882    0.05135  -3.677
> > >> standlrt        0.55442    0.01994  27.807
> > >> schgendboysch  0.17986    0.09915  1.814
> > >> schgendgirlsch  0.17482    0.07877  2.219
> > >> sexgirl        0.16826    0.03382  4.975
> > >>
> > >> One curious thing you'll notice is that there are no coefficients for
> > the interaction terms. Why? Because those terms were "thrown out" of the
> > model, and so they are not shown. I think it is unwise to not show what
> was
> > thrown out (e.g., lm would have shown them as NAs), because in fact what
> we
> > see is but one of infinitely many possible solutions to the regression
> > equations. This is the solution where the last two coefficients are
> > constrained to zero. There is another equally reasonable one where the
> > coefficients for schgendboysch and schgendgirlsch  are constrained to
> zero,
> > and the two interaction effects would then be non-zero. And infinitely
> more
> > where all 7 coefficients are non-zero, and there are two linear
> constraints
> > among them.
> > >>
> > >> Of course, since the particular estimate shown consists of all the
> main
> > effects and interactions are constrained to zero, it does demonstrate
> that
> > the additive model *could* have been used to obtain the same estimates
> and
> > standard errors, and you can see that by comparing the results (and
> > ignoring the invalid ones from the additive model). But it is just a
> lucky
> > coincidence that it worked out this way, and the additive model did lead
> us
> > down a primrose path containing silly results among the correct ones.
> > >>
> > >> Russ
> > >>
> > >> -----Original Message-----
> > >> From: Lenth, Russell V
> > >> Sent: Sunday, September 26, 2021 7:43 PM
> > >> To: Simon Harmel <sim.harmel using gmail.com>
> > >> Cc: r-sig-mixed-models using r-project.org
> > >> Subject: RE: [External] Re: [R-sig-ME] Help with interpreting one
> > fixed-effect coefficient
> > >>
> > >> I guess correctness is in the eyes of the beholder. But I think this
> > illustrates the folly of the additive model. Having additive effects
> > suggests a belief that you can vary one factor more or less independently
> > of the other. In his comments, John Fox makes a good point that escaped
> my
> > earlier cursory view of the original question, that you don't have data
> on
> > girls attending all-boys' schools, nor boys attending all-girls' schools;
> > yet the model that was fitted estimates a mean response for both those
> > situations. That's a pretty clear testament to the failure of that model
> –
> > and also why the coefficients don't make sense. And finally why we have
> > estimates of 15 comparisons (some of which are aliased with one another),
> > when only 6 of them make sense.
> > >>
> > >> If instead, a model with interaction were fitted, it would be a
> > rank-deficient model because two cells are empty. Perhaps there is some
> > sort of nesting structure that could be used to work around that.
> However,
> > it doesn't matter much because emmeans assesses estimability, and the two
> > combinations I mentioned above would be flagged as non-estimable. One
> could
> > then more judiciously use the contrast function to test meaningful
> > contrasts across this irregular array of cell means. Or even
> injudiciously
> > asking for all pairwise comparisons, you will see 6 estimable ones and 9
> > non-estimable ones. See output below.
> > >>
> > >> Russ
> > >>
> > >> ----- Interactive model -----
> > >>
> > >>> Form <- normexam ~ 1 + standlrt + schgend * sex + (standlrt | school)
> > >>> model <- lmer(Form, data = tutorial, REML = FALSE)
> > >> fixed-effect model matrix is rank deficient so dropping 2 columns /
> > coefficients
> > >>>
> > >>> emmeans(model, pairwise~schgend+sex)
> > >>
> > >> ... messages deleted ...
> > >>
> > >> $emmeans
> > >>  schgend  sex    emmean    SE  df asymp.LCL asymp.UCL
> > >>  mixedsch boy  -0.18781 0.0514 Inf  -0.2885  -0.0871
> > >>  boysch  boy  -0.00795 0.0880 Inf  -0.1805    0.1646
> > >>  girlsch  boy    nonEst    NA  NA        NA        NA
> > >>  mixedsch girl -0.01955 0.0521 Inf  -0.1216    0.0825
> > >>  boysch  girl  nonEst    NA  NA        NA        NA
> > >>  girlsch  girl  0.15527 0.0632 Inf    0.0313    0.2792
> > >>
> > >> Degrees-of-freedom method: asymptotic
> > >> Confidence level used: 0.95
> > >>
> > >> $contrasts
> > >>  contrast                    estimate    SE  df z.ratio p.value
> > >>  mixedsch boy - boysch boy    -0.1799 0.0991 Inf  -1.814  0.4565
> > >>  mixedsch boy - girlsch boy    nonEst    NA  NA      NA      NA
> > >>  mixedsch boy - mixedsch girl  -0.1683 0.0338 Inf  -4.975  <.0001
> > >>  mixedsch boy - boysch girl    nonEst    NA  NA      NA      NA
> > >>  mixedsch boy - girlsch girl  -0.3431 0.0780 Inf  -4.396  0.0002
> > >>  boysch boy - girlsch boy      nonEst    NA  NA      NA      NA
> > >>  boysch boy - mixedsch girl    0.0116 0.0997 Inf  0.116  1.0000
> > >>  boysch boy - boysch girl      nonEst    NA  NA      NA      NA
> > >>  boysch boy - girlsch girl    -0.1632 0.1058 Inf  -1.543  0.6361
> > >>  girlsch boy - mixedsch girl    nonEst    NA  NA      NA      NA
> > >>  girlsch boy - boysch girl      nonEst    NA  NA      NA      NA
> > >>  girlsch boy - girlsch girl    nonEst    NA  NA      NA      NA
> > >>  mixedsch girl - boysch girl    nonEst    NA  NA      NA      NA
> > >>  mixedsch girl - girlsch girl  -0.1748 0.0788 Inf  -2.219  0.2287
> > >>  boysch girl - girlsch girl    nonEst    NA  NA      NA      NA
> > >>
> > >> Degrees-of-freedom method: asymptotic
> > >> P value adjustment: tukey method for comparing a family of 6 estimates
> > >>
> > >>
> > >> ---------------------------------------------------------
> > >> From: Simon Harmel <sim.harmel using gmail.com>
> > >> Sent: Sunday, September 26, 2021 3:08 PM
> > >> To: Lenth, Russell V <russell-lenth using uiowa.edu>
> > >> Cc: r-sig-mixed-models using r-project.org
> > >> Subject: [External] Re: [R-sig-ME] Help with interpreting one
> > fixed-effect coefficient
> > >>
> > >> Dear Russ and the List Members,
> > >>
> > >> If we use Russ' great package (emmeans), we see that although
> > meaningless, but "schgendgirl-only" can be interpreted using the logic I
> > mentioned here:
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fpipermail%2Fr-sig-mixed-models%2F2021q3%2F029723.html&data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&sdata=kUR%2BudOSdu9gHZCsdimDJGEuheQLyI5pBlwqNctQu4A%3D&reserved=0
> .
> > >>
> > >> That is, "schgendgirl-only" can meaninglessly mean: ***diff. bet. boys
> > in girl-only vs. mixed schools*** just like it can meaningfully mean:
> > ***diff. bet. girls in girl-only vs. mixed schools***
> > >>
> > >> Russ, have I used emmeans correctly?
> > >>
> > >> Simon
> > >>
> > >> Here is a reproducible code:
> > >>
> > >> library(R2MLwiN) # For the dataset
> > >> library(lme4)
> > >> library(emmeans)
> > >>
> > >> data("tutorial")
> > >>
> > >> Form <- normexam ~ 1 + standlrt + schgend + sex + (standlrt | school)
> > >> model <- lmer(Form, data = tutorial, REML = FALSE)
> > >>
> > >> emmeans(model, pairwise~schgend+sex)$contrast
> > >>
> > >> contrast                    estimate    SE  df z.ratio p.value
> > >> mixedsch boy - boysch boy    -0.17986 0.0991 Inf -1.814  0.4565
> > >> mixedsch boy - girlsch boy  -0.17482 0.0788 Inf -2.219  0.2287
> >  <--This coef. equals
> > >> mixedsch boy - mixedsch girl -0.16826 0.0338 Inf -4.975  <.0001
> > >> mixedsch boy - boysch girl  -0.34813 0.1096 Inf -3.178  0.0186
> > >> mixedsch boy - girlsch girl  -0.34308 0.0780 Inf -4.396  0.0002
> > >> boysch boy - girlsch boy      0.00505 0.1110 Inf  0.045  1.0000
> > >> boysch boy - mixedsch girl    0.01160 0.0997 Inf  0.116  1.0000
> > >> boysch boy - boysch girl    -0.16826 0.0338 Inf -4.975  <.0001
> > >> boysch boy - girlsch girl    -0.16322 0.1058 Inf -1.543  0.6361
> > >> girlsch boy - mixedsch girl  0.00656 0.0928 Inf  0.071  1.0000
> > >> girlsch boy - boysch girl    -0.17331 0.1255 Inf -1.381  0.7388
> > >> girlsch boy - girlsch girl  -0.16826 0.0338 Inf -4.975  <.0001
> > >> mixedsch girl - boysch girl  -0.17986 0.0991 Inf -1.814  0.4565
> > >> mixedsch girl - girlsch girl -0.17482 0.0788 Inf -2.219  0.2287
> >  <--This coef.
> > >> boysch girl - girlsch girl    0.00505 0.1110 Inf  0.045  1.0000
> > >>
> > >>
> > >
> > > _______________________________________________
> > > R-sig-mixed-models using r-project.org mailing list
> > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&sdata=VXBlGoxZ5iq3OWpGhpxjVbAn9w4OUUTtSp8BARHFQW0%3D&reserved=0
> > >
> > --
> > John Fox, Professor Emeritus
> > McMaster University
> > Hamilton, Ontario, Canada
> > web:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsocialsciences.mcmaster.ca%2Fjfox%2F&data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&sdata=%2BAvoQotl3QBMkVTOWiHJtHPJ%2B79wFLAMF39m6Cgb01A%3D&reserved=0
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&sdata=VXBlGoxZ5iq3OWpGhpxjVbAn9w4OUUTtSp8BARHFQW0%3D&reserved=0
> >
>
>     [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&sdata=VXBlGoxZ5iq3OWpGhpxjVbAn9w4OUUTtSp8BARHFQW0%3D&reserved=0
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&sdata=VXBlGoxZ5iq3OWpGhpxjVbAn9w4OUUTtSp8BARHFQW0%3D&reserved=0
>
>
>

	[[alternative HTML version deleted]]