[R-sig-ME] Collinearity diagnostics for (mixed) multinomial models
Sorkin, John
j@ork|n @end|ng |rom @om@um@ry|@nd@edu
Fri Feb 25 14:18:57 CET 2022
I would agree with Steven. Collinearity is problem with the predictor variables, not the outcome variable. Given a multinomial model y = f(x1, x2, x3, . . . xn), one could run a simple linear regression x1 = f(x2,x3, . . .,xn) and look at vif to determine if x2 . . . xn are colinear and perhaps an additional regression x2=f(x1,x3, . . .xn) to determine if x1, x3, . . . xn are colinear. If I am missing something, I hope someone will correct me.
John (but not John Fox)
Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows
From: stevedrd--- via R-sig-mixed-models<mailto:r-sig-mixed-models using r-project.org>
Sent: Friday, February 25, 2022 8:07 AM
To: John Fox<mailto:jfox using mcmaster.ca>; Juho Kristian Ruohonen<mailto:juho.kristian.ruohonen using gmail.com>
Cc: r-sig-mixed-models using r-project.org<mailto:r-sig-mixed-models using r-project.org>
Subject: Re: [R-sig-ME] Collinearity diagnostics for (mixed) multinomial models
This seems odd to me, but then I don't usually analyze multinomial models. Is there an issue with collinearity in the response variable in a multinomial model? I would think that the levels are collinear by definition. So then the issue, it seems to me, is whether there is collinearity in the fixed effects - and that should be independent of the response variables. Could you use the vif() function with a standard response (say = 1) to check collinearity in the fixed effects? I would think that your method on the sub datasets may not capture all of the collinearity in the full model.
But I could be waaaaaaay off base on this.
SteveDenham
On Friday, February 25, 2022, 03:24:15 AM EST, Juho Kristian Ruohonen <juho.kristian.ruohonen using gmail.com> wrote:
Dear John (and anyone else qualified to comment),
I fit lots of mixed-effects multinomial models in my research, and I would
like to see some (multi)collinearity diagnostics on the fixed effects, of
which there are over 30. My models are fit using the Bayesian *brms*
package because I know of no frequentist packages with multinomial GLMM
compatibility.
With continuous or dichotomous outcomes, my go-to function for calculating
multicollinearity diagnostics is of course *vif()* from the *car* package.
As expected, however, this function does not report sensible diagnostics
for multinomial models -- not even for standard ones fit by the *nnet*
package's *multinom()* function. The reason, I presume, is because a
multinomial model is not really one but C-1 regression models (where C is
the number of response categories) and the *vif()* function is not designed
to deal with this scenario.
Therefore, in order to obtain meaningful collinearity metrics, my present
plan is to write a simple helper function that uses *vif() *to calculate
and present (generalized) variance inflation metrics for the C-1
sub-datasets to which the C-1 component binomial models of the overall
multinomial model are fit. In other words, it will partition the data into
those C-1 subsets, and then apply *vif()* to as many linear regressions
using a made-up continuous response and the fixed effects of interest.
Does this seem like a sensible approach?
Best,
Juho
ma 27. syysk. 2021 klo 19.26 John Fox (jfox using mcmaster.ca) kirjoitti:
> Dear Simon,
>
> I believe that Russ's point is that the fact that the additive model
> allows you to estimate nonsensical quantities like a mean for girls in
> all-boys' schools implies a problem with the model. Why not do as I
> suggested and define two dichotomous factors: sex of student
> (male/female) and type of school (coed, same-sex)? The four combinations
> of levels then make sense.
>
> Best,
> John
>
> On 2021-09-27 12:09 p.m., Simon Harmel wrote:
> > Thanks, Russ! There is one thing that I still don't understand. We
> > have two completely empty cells (boys in girl-only & girls in boy-only
> > schools). Then, how are the means of those empty cells computed (what
> > data is used in their place in the additive model)?
> >
> > Let's' simplify the model for clarity:
> >
> > library(R2MLwiN)
> > library(emmeans)
> >
> > Form3 <- normexam ~ schgend + sex ## + standlrt + (standlrt | school)
> > model3 <- lm(Form3, data = tutorial)
> >
> > emmeans(model3, pairwise~sex+schgend)$emmeans
> >
> > sex schgend emmean SE df lower.CL upper.CL
> > boy mixedsch -0.2160 0.0297 4055 -0.2742 -0.15780
> > girl mixedsch 0.0248 0.0304 4055 -0.0348 0.08437
> > boy boysch 0.0234 0.0437 4055 -0.0623 0.10897
> > girl boysch 0.2641 0.0609 4055 0.1447 0.38360<-how computed?
> > boy girlsch -0.0948 0.0502 4055 -0.1931 0.00358<-how computed?
> > girl girlsch 0.1460 0.0267 4055 0.0938 0.19829
> >
> >
> >
> >
> >
> > On Sun, Sep 26, 2021 at 8:22 PM Lenth, Russell V
> > <russell-lenth using uiowa.edu> wrote:
> >>
> >> By the way, returning to the topic of interpreting coefficients, you
> ought to have fun with the ones from the model I just fitted:
> >>
> >> Fixed effects:
> >> Estimate Std. Error t value
> >> (Intercept) -0.18882 0.05135 -3.677
> >> standlrt 0.55442 0.01994 27.807
> >> schgendboysch 0.17986 0.09915 1.814
> >> schgendgirlsch 0.17482 0.07877 2.219
> >> sexgirl 0.16826 0.03382 4.975
> >>
> >> One curious thing you'll notice is that there are no coefficients for
> the interaction terms. Why? Because those terms were "thrown out" of the
> model, and so they are not shown. I think it is unwise to not show what was
> thrown out (e.g., lm would have shown them as NAs), because in fact what we
> see is but one of infinitely many possible solutions to the regression
> equations. This is the solution where the last two coefficients are
> constrained to zero. There is another equally reasonable one where the
> coefficients for schgendboysch and schgendgirlsch are constrained to zero,
> and the two interaction effects would then be non-zero. And infinitely more
> where all 7 coefficients are non-zero, and there are two linear constraints
> among them.
> >>
> >> Of course, since the particular estimate shown consists of all the main
> effects and interactions are constrained to zero, it does demonstrate that
> the additive model *could* have been used to obtain the same estimates and
> standard errors, and you can see that by comparing the results (and
> ignoring the invalid ones from the additive model). But it is just a lucky
> coincidence that it worked out this way, and the additive model did lead us
> down a primrose path containing silly results among the correct ones.
> >>
> >> Russ
> >>
> >> -----Original Message-----
> >> From: Lenth, Russell V
> >> Sent: Sunday, September 26, 2021 7:43 PM
> >> To: Simon Harmel <sim.harmel using gmail.com>
> >> Cc: r-sig-mixed-models using r-project.org
> >> Subject: RE: [External] Re: [R-sig-ME] Help with interpreting one
> fixed-effect coefficient
> >>
> >> I guess correctness is in the eyes of the beholder. But I think this
> illustrates the folly of the additive model. Having additive effects
> suggests a belief that you can vary one factor more or less independently
> of the other. In his comments, John Fox makes a good point that escaped my
> earlier cursory view of the original question, that you don't have data on
> girls attending all-boys' schools, nor boys attending all-girls' schools;
> yet the model that was fitted estimates a mean response for both those
> situations. That's a pretty clear testament to the failure of that model �
> and also why the coefficients don't make sense. And finally why we have
> estimates of 15 comparisons (some of which are aliased with one another),
> when only 6 of them make sense.
> >>
> >> If instead, a model with interaction were fitted, it would be a
> rank-deficient model because two cells are empty. Perhaps there is some
> sort of nesting structure that could be used to work around that. However,
> it doesn't matter much because emmeans assesses estimability, and the two
> combinations I mentioned above would be flagged as non-estimable. One could
> then more judiciously use the contrast function to test meaningful
> contrasts across this irregular array of cell means. Or even injudiciously
> asking for all pairwise comparisons, you will see 6 estimable ones and 9
> non-estimable ones. See output below.
> >>
> >> Russ
> >>
> >> ----- Interactive model -----
> >>
> >>> Form <- normexam ~ 1 + standlrt + schgend * sex + (standlrt | school)
> >>> model <- lmer(Form, data = tutorial, REML = FALSE)
> >> fixed-effect model matrix is rank deficient so dropping 2 columns /
> coefficients
> >>>
> >>> emmeans(model, pairwise~schgend+sex)
> >>
> >> ... messages deleted ...
> >>
> >> $emmeans
> >> schgend sex emmean SE df asymp.LCL asymp.UCL
> >> mixedsch boy -0.18781 0.0514 Inf -0.2885 -0.0871
> >> boysch boy -0.00795 0.0880 Inf -0.1805 0.1646
> >> girlsch boy nonEst NA NA NA NA
> >> mixedsch girl -0.01955 0.0521 Inf -0.1216 0.0825
> >> boysch girl nonEst NA NA NA NA
> >> girlsch girl 0.15527 0.0632 Inf 0.0313 0.2792
> >>
> >> Degrees-of-freedom method: asymptotic
> >> Confidence level used: 0.95
> >>
> >> $contrasts
> >> contrast estimate SE df z.ratio p.value
> >> mixedsch boy - boysch boy -0.1799 0.0991 Inf -1.814 0.4565
> >> mixedsch boy - girlsch boy nonEst NA NA NA NA
> >> mixedsch boy - mixedsch girl -0.1683 0.0338 Inf -4.975 <.0001
> >> mixedsch boy - boysch girl nonEst NA NA NA NA
> >> mixedsch boy - girlsch girl -0.3431 0.0780 Inf -4.396 0.0002
> >> boysch boy - girlsch boy nonEst NA NA NA NA
> >> boysch boy - mixedsch girl 0.0116 0.0997 Inf 0.116 1.0000
> >> boysch boy - boysch girl nonEst NA NA NA NA
> >> boysch boy - girlsch girl -0.1632 0.1058 Inf -1.543 0.6361
> >> girlsch boy - mixedsch girl nonEst NA NA NA NA
> >> girlsch boy - boysch girl nonEst NA NA NA NA
> >> girlsch boy - girlsch girl nonEst NA NA NA NA
> >> mixedsch girl - boysch girl nonEst NA NA NA NA
> >> mixedsch girl - girlsch girl -0.1748 0.0788 Inf -2.219 0.2287
> >> boysch girl - girlsch girl nonEst NA NA NA NA
> >>
> >> Degrees-of-freedom method: asymptotic
> >> P value adjustment: tukey method for comparing a family of 6 estimates
> >>
> >>
> >> ---------------------------------------------------------
> >> From: Simon Harmel <sim.harmel using gmail.com>
> >> Sent: Sunday, September 26, 2021 3:08 PM
> >> To: Lenth, Russell V <russell-lenth using uiowa.edu>
> >> Cc: r-sig-mixed-models using r-project.org
> >> Subject: [External] Re: [R-sig-ME] Help with interpreting one
> fixed-effect coefficient
> >>
> >> Dear Russ and the List Members,
> >>
> >> If we use Russ' great package (emmeans), we see that although
> meaningless, but "schgendgirl-only" can be interpreted using the logic I
> mentioned here:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fpipermail%2Fr-sig-mixed-models%2F2021q3%2F029723.html&data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&sdata=kUR%2BudOSdu9gHZCsdimDJGEuheQLyI5pBlwqNctQu4A%3D&reserved=0 .
> >>
> >> That is, "schgendgirl-only" can meaninglessly mean: ***diff. bet. boys
> in girl-only vs. mixed schools*** just like it can meaningfully mean:
> ***diff. bet. girls in girl-only vs. mixed schools***
> >>
> >> Russ, have I used emmeans correctly?
> >>
> >> Simon
> >>
> >> Here is a reproducible code:
> >>
> >> library(R2MLwiN) # For the dataset
> >> library(lme4)
> >> library(emmeans)
> >>
> >> data("tutorial")
> >>
> >> Form <- normexam ~ 1 + standlrt + schgend + sex + (standlrt | school)
> >> model <- lmer(Form, data = tutorial, REML = FALSE)
> >>
> >> emmeans(model, pairwise~schgend+sex)$contrast
> >>
> >> contrast estimate SE df z.ratio p.value
> >> mixedsch boy - boysch boy -0.17986 0.0991 Inf -1.814 0.4565
> >> mixedsch boy - girlsch boy -0.17482 0.0788 Inf -2.219 0.2287
> <--This coef. equals
> >> mixedsch boy - mixedsch girl -0.16826 0.0338 Inf -4.975 <.0001
> >> mixedsch boy - boysch girl -0.34813 0.1096 Inf -3.178 0.0186
> >> mixedsch boy - girlsch girl -0.34308 0.0780 Inf -4.396 0.0002
> >> boysch boy - girlsch boy 0.00505 0.1110 Inf 0.045 1.0000
> >> boysch boy - mixedsch girl 0.01160 0.0997 Inf 0.116 1.0000
> >> boysch boy - boysch girl -0.16826 0.0338 Inf -4.975 <.0001
> >> boysch boy - girlsch girl -0.16322 0.1058 Inf -1.543 0.6361
> >> girlsch boy - mixedsch girl 0.00656 0.0928 Inf 0.071 1.0000
> >> girlsch boy - boysch girl -0.17331 0.1255 Inf -1.381 0.7388
> >> girlsch boy - girlsch girl -0.16826 0.0338 Inf -4.975 <.0001
> >> mixedsch girl - boysch girl -0.17986 0.0991 Inf -1.814 0.4565
> >> mixedsch girl - girlsch girl -0.17482 0.0788 Inf -2.219 0.2287
> <--This coef.
> >> boysch girl - girlsch girl 0.00505 0.1110 Inf 0.045 1.0000
> >>
> >>
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&sdata=VXBlGoxZ5iq3OWpGhpxjVbAn9w4OUUTtSp8BARHFQW0%3D&reserved=0
> >
> --
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsocialsciences.mcmaster.ca%2Fjfox%2F&data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&sdata=%2BAvoQotl3QBMkVTOWiHJtHPJ%2B79wFLAMF39m6Cgb01A%3D&reserved=0
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&sdata=VXBlGoxZ5iq3OWpGhpxjVbAn9w4OUUTtSp8BARHFQW0%3D&reserved=0
>
[[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models using r-project.org mailing list
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&sdata=VXBlGoxZ5iq3OWpGhpxjVbAn9w4OUUTtSp8BARHFQW0%3D&reserved=0
[[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models using r-project.org mailing list
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&sdata=VXBlGoxZ5iq3OWpGhpxjVbAn9w4OUUTtSp8BARHFQW0%3D&reserved=0
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list