[R-sig-ME] Collinearity diagnostics for (mixed) multinomial models

Juho Kristian Ruohonen juho@kr|@t|@n@ruohonen @end|ng |rom gm@||@com
Tue Mar 1 21:13:35 CET 2022


Dear John,

Yes, my function uses your code for the math. I was just hoping to verify
that it is handling multicategory factors correctly (your examples didn't
involve any).

I guess interactions aren't that important after all, given that the chief
concern is usually collinearity among main effects.

Many thanks for all your help.

Best,

Juho

ti 1. maalisk. 2022 klo 18.01 John Fox (jfox using mcmaster.ca) kirjoitti:

> Dear Juho,
>
> On 2022-03-01 8:24 a.m., Juho Kristian Ruohonen wrote:
> > Dear John (Fox, as well as other list members),
> >
> > I've now written a simple function to try and calculate GVIFS for all
> > predictors in a nnet::multinom() object based on John's example code. If
> > its results are correct (see below), I will proceed to write a version
> > that also works with mixed-effects multinomial models fit by
> > brms::brm(). Here's the code:
> >
> >     gvif.multinom <- function(model){
> >        (classes <- model$lev)
> >        (V.all <- vcov(model))
> >        (V.noIntercepts <- V.all[!grepl("\\(Intercept\\)$",
> >     rownames(V.all), perl = T),
> >                                 !grepl("\\(Intercept\\)$",
> >     colnames(V.all), perl = T)])
> >        (R <- cov2cor(V.noIntercepts))
> >        (terms <- attr(model$terms, "term.labels"))
> >        (gvif <- numeric(length = length(terms)))
> >        (names(gvif) <- terms)
> >        (SE.multiplier <- numeric(length = length(terms)))
> >        (names(SE.multiplier) <- terms)
> >        #The line below tries to capture all factor levels into a regex
> >     for coef name matching.
> >        (LevelsRegex <- paste0("(", paste(unlist(model$xlevels), collapse
> >     = "|"),")?"))
> >
> >        for(i in terms){
> >          #The regex stuff below tries to ensure all interaction
> >     coefficients are matched, including those involving factors.
> >          if(grepl(":", i)){
> >            (termname <- gsub(":", paste0(LevelsRegex, ":"), i, perl = T))
> >          }else{termname <- i}
> >          (RegexToMatch <- paste0("^(", paste(classes[2:length(classes)],
> >     collapse = "|") ,"):", termname, LevelsRegex, "$"))
> >
> >          #Now the actual calculation:
> >          (indices <- grep(RegexToMatch, rownames(R), perl = T))
> >          (gvif[i] <- det(R[indices, indices]) * det(R[-indices,
> >     -indices]) / det(R))
> >          (SE.multiplier[i] <- gvif[i]^(1/(2*length(indices))))
> >        }
> >        #Put the results together and order them by degree of SE
> inflation:
> >        (result <- cbind(GVIF = gvif, `GVIF^(1/(2df))` = SE.multiplier))
> >        return(result[order(result[,"GVIF^(1/(2df))"], decreasing = T),])}
> >
> >
> > The results seem correct to me when applied to John's example model fit
> > to the BEPS data. However, that dataset contains no multi-df factors, of
> > which my own models have many. Below is a maximally simple example with
> > one multi-df factor (/region/):
> >
> >     mod1 <- multinom(partic ~., data = carData::Womenlf)
> >     gvif.multinom(mod1)
> >
> >     GVIF GVIF^(1/(2df))
> >     children 1.298794       1.067542
> >     hincome  1.184215       1.043176
> >     region   1.381480       1.020403
> >
> >
> > These results look plausible to me. Finally, below is an example
> > involving both a multi-df factor and an interaction:
> >
> >     mod2 <- update(mod1, ~. +children:region)
> >     gvif.multinom(mod2)
> >
> >                              GVIF GVIF^(1/(2df))
> >     children:region 4.965762e+16      11.053482
> >     region          1.420418e+16      10.221768
> >     children        1.471412e+03       6.193463
> >     hincome         6.462161e+00       1.594390
> >
> >
> > These results look a bit more dubious. To be sure, it is to be expected
> > that interaction terms will introduce a lot of collinearity. But an
> > 11-fold increase in SE? I hope someone can tell me whether this is
> > correct or not!
>
> You don't need someone else to check your work because you could just
> apply the simple function that I sent you yesterday, which, though not
> automatic, computes the GVIFs in a transparent manner.
>
> A brief comment on GVIFs for models with interactions (this isn't the
> place to discuss the question in detail): The Fox and Monette JASA paper
> addresses the question briefly in the context of a two-way ANOVA, but I
> don't think that the approach suggested there is easily generalized.
>
> The following simple approach pays attention to what's invariant under
> different parametrizations of the RHS side of the model: Simultaneously
> check the collinearity of all of the coefficients of an interaction
> together with the main effects and, potentially, lower-order
> interactions that are marginal to it. So, e.g., in the model y ~ a + b +
> a:b + c, you'd check all of the coefficients for a, b, and a:b together.
>
> Alternatively, one could focus in turn on each explanatory variable and
> check the collinearity of all coefficients to which it is marginal. So
> in y ~ a + b + c + a:b + a:c + d, when you focus on a, you'd look at all
> of the coefficients for a, b, c, a:b, and a:c.
>
> John
>
> >
> > Best,
> >
> > Juho
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ti 1. maalisk. 2022 klo 0.05 John Fox (jfox using mcmaster.ca
> > <mailto:jfox using mcmaster.ca>) kirjoitti:
> >
> >     Dear Juha,
> >
> >     On 2022-02-28 5:00 p.m., Juho Kristian Ruohonen wrote:
> >      > Apologies for my misreading, John, and many thanks for showing
> >     how the
> >      > calculation is done for a single term.
> >      >
> >      > Do you think *vif()* might be updated in the near future with the
> >      > capability of auto-detecting a multinomial model and returning
> >      > mathematically correct GVIF statistics?
> >
> >     The thought crossed my mind, but I'd want to do it in a general way,
> >     not
> >     just for the multinom() function, and in a way that avoids incorrect
> >     results such as those currently produced for "multinom" models,
> albeit
> >     with a warning. I can't guarantee whether or when I'll be able to do
> >     that.
> >
> >     John
> >
> >      >
> >      > If not, I'll proceed to writing my own function based on your
> >     example.
> >      > However, /car/ is such an excellent and widely used package that
> the
> >      > greatest benefit to mankind would probably accrue if /car /was
> >     upgraded
> >      > with this feature sooner rather than later.
> >      >
> >      > Best,
> >      >
> >      > Juho
> >      >
> >      >
> >      >
> >      >
> >      >
> >      >
> >      >
> >      >
> >      >
> >      > ma 28. helmik. 2022 klo 17.08 John Fox (jfox using mcmaster.ca
> >     <mailto:jfox using mcmaster.ca>
> >      > <mailto:jfox using mcmaster.ca <mailto:jfox using mcmaster.ca>>) kirjoitti:
> >      >
> >      >     Dear Juho,
> >      >
> >      >     On 2022-02-28 2:06 a.m., Juho Kristian Ruohonen wrote:
> >      >      > Dear Professor Fox and other list members,
> >      >      >
> >      >      > Profuse thanks for doing that detective work for me! I
> myself
> >      >     thought
> >      >      > the inflation factors reported by check_collinearity() were
> >      >     suspiciously
> >      >      > high, but unlike you I lacked the expertise to identify
> >     what was
> >      >     going on.
> >      >      >
> >      >      > As for your suggested approach, have I understood this
> >     correctly:
> >      >      >
> >      >      > Since there doesn't yet exist an R function that will
> >     calculate the
> >      >      > (G)VIFS of multinomial models correctly, my best bet for
> >     now is
> >      >     just to
> >      >      > ignore the fact that such models partition the data into
> C-1
> >      >     subsets,
> >      >      > and to calculate approximate GVIFs from the entire dataset
> at
> >      >     once as if
> >      >      > the response were continuous? And a simple way to do this
> >     is to
> >      >      > construct a fake continuous response, call
> >     *lm(fakeresponse ~.)*,
> >      >     and
> >      >      > apply *car::vif()* on the result?
> >      >
> >      >     No, you misunderstand my suggestion, which perhaps isn't
> >     surprising
> >      >     given the length of my message. What you propose is what I
> >     suggested as
> >      >     a rough approximation *before* I confirmed that my guess of
> the
> >      >     solution
> >      >     was correct.
> >      >
> >      >     The R code that I sent yesterday showed how to compute the
> >     GVIF for a
> >      >     multinomial regression model, and I suggested that you write
> >     either a
> >      >     script or a simple function to do that. Here's a function
> >     that will
> >      >     work
> >      >     for a model object that responds to vcov():
> >      >
> >      >     GVIF <- function(model, intercepts, term){
> >      >         # model: regression model object
> >      >         # intercepts: row/column positions of intercepts in the
> >     coefficient
> >      >     covariance matrix
> >      >         # term: row/column positions of the coefficients for the
> >     focal term
> >      >         V <- vcov(model)
> >      >         term <- colnames(V)[term]
> >      >         V <- V[-intercepts, -intercepts]
> >      >         V <- cov2cor(V)
> >      >         term <- which(colnames(V) %in% term)
> >      >         gvif <- det(V[term, term])*det(V[-term, -term])/det(V)
> >      >         c(GVIF=gvif, "GVIF^(1/(2*p))"=gvif^(1/(2*length(term))))
> >      >     }
> >      >
> >      >     and here's an application to the multinom() example that I
> >     showed you
> >      >     yesterday:
> >      >
> >      >       > colnames(vcov(m)) # to get coefficient positions
> >      >        [1] "Labour:(Intercept)"                       "Labour:age"
> >      >
> >      >        [3] "Labour:economic.cond.national"
> >      >     "Labour:economic.cond.household"
> >      >        [5] "Labour:Blair"
>  "Labour:Hague"
> >      >
> >      >        [7] "Labour:Kennedy"
>  "Labour:Europe"
> >      >
> >      >        [9] "Labour:political.knowledge"
> >       "Labour:gendermale"
> >      >
> >      >     [11] "Liberal Democrat:(Intercept)"             "Liberal
> >     Democrat:age"
> >      >
> >      >     [13] "Liberal Democrat:economic.cond.national"  "Liberal
> >      >     Democrat:economic.cond.household"
> >      >     [15] "Liberal Democrat:Blair"                   "Liberal
> >      >     Democrat:Hague"
> >      >
> >      >     [17] "Liberal Democrat:Kennedy"                 "Liberal
> >      >     Democrat:Europe"
> >      >     [19] "Liberal Democrat:political.knowledge"     "Liberal
> >      >     Democrat:gendermale"
> >      >
> >      >       > GVIF(m, intercepts=c(1, 11), term=c(2, 12)) # GVIF for age
> >      >                 GVIF GVIF^(1/(2*p))
> >      >             1.046232       1.011363
> >      >
> >      >
> >      >     Finally, here's what you get for a linear model with the same
> RHS
> >      >     (where
> >      >     the sqrt(VIF) should be a rough approximation to GVIF^(1/4)
> >     reported by
> >      >     my GVIF() function):
> >      >
> >      >       > m.lm <- lm(as.numeric(vote) ~ . - vote1, data=BEPS)
> >      >       > sqrt(car::vif(m.lm))
> >      >                           age  economic.cond.national
> >      >     economic.cond.household
> >      >                         Blair
> >      >                      1.006508                1.124132
> >      >     1.075656
> >      >                      1.118441
> >      >                         Hague                 Kennedy
> >      >     Europe
> >      >           political.knowledge
> >      >                      1.066799                1.015532
> >      >     1.101741
> >      >                      1.028546
> >      >                        gender
> >      >                      1.017386
> >      >
> >      >
> >      >     John
> >      >
> >      >      >
> >      >      > Best,
> >      >      >
> >      >      > Juho
> >      >      >
> >      >      > ma 28. helmik. 2022 klo 2.23 John Fox (jfox using mcmaster.ca
> >     <mailto:jfox using mcmaster.ca>
> >      >     <mailto:jfox using mcmaster.ca <mailto:jfox using mcmaster.ca>>
> >      >      > <mailto:jfox using mcmaster.ca <mailto:jfox using mcmaster.ca>
> >     <mailto:jfox using mcmaster.ca <mailto:jfox using mcmaster.ca>>>) kirjoitti:
> >      >      >
> >      >      >     Dear Juho,
> >      >      >
> >      >      >     I've now had a chance to think about this problem some
> >     more,
> >      >     and I
> >      >      >     believe that the approach I suggested is correct. I
> >     also had an
> >      >      >     opportunity to talk the problem over a bit with Georges
> >      >     Monette, who
> >      >      >     coauthored the paper that introduced generalized
> variance
> >      >     inflation
> >      >      >     factors (GVIFs). On the other hand, the results
> >     produced by
> >      >      >     performance::check_collinearity() for multinomial logit
> >      >     models don't
> >      >      >     seem to be correct (see below).
> >      >      >
> >      >      >     Here's an example, using the nnet::multinom() function
> >     to fit a
> >      >      >     multinomial logit model, with alternative
> >     parametrizations of the
> >      >      >     LHS of
> >      >      >     the model:
> >      >      >
> >      >      >     --------- snip -----------
> >      >      >
> >      >      >       > library(nnet) # for multinom()
> >      >      >       > library(carData) # for BEPS data set
> >      >      >
> >      >      >       > # alternative ordering of the response levels:
> >      >      >       > BEPS$vote1 <- factor(BEPS$vote, levels=c("Labour",
> >     "Liberal
> >      >      >     Democrat", "Conservative"))
> >      >      >       > levels(BEPS$vote)
> >      >      >     [1] "Conservative"     "Labour"           "Liberal
> >     Democrat"
> >      >      >       > levels(BEPS$vote1)
> >      >      >     [1] "Labour"           "Liberal Democrat"
> "Conservative"
> >      >      >
> >      >      >       > m <- multinom(vote ~ . - vote1, data=BEPS)
> >      >      >     # weights:  33 (20 variable)
> >      >      >     initial  value 1675.383740
> >      >      >     iter  10 value 1345.935273
> >      >      >     iter  20 value 1150.956807
> >      >      >     iter  30 value 1141.921662
> >      >      >     iter  30 value 1141.921661
> >      >      >     iter  30 value 1141.921661
> >      >      >     final  value 1141.921661
> >      >      >     converged
> >      >      >       > m1 <- multinom(vote1 ~ . - vote, data=BEPS)
> >      >      >     # weights:  33 (20 variable)
> >      >      >     initial  value 1675.383740
> >      >      >     iter  10 value 1280.439304
> >      >      >     iter  20 value 1165.513772
> >      >      >     final  value 1141.921662
> >      >      >     converged
> >      >      >
> >      >      >       > rbind(coef(m), coef(m1)) # compare coefficients
> >      >      >                        (Intercept)          age
> >      >     economic.cond.national
> >      >      >     economic.cond.household
> >      >      >     Labour             0.9515214 -0.021913989
> >     0.5575707
> >      >      >            0.15839096
> >      >      >     Liberal Democrat   1.4119306 -0.016810735
> >     0.1810761
> >      >      >           -0.01196664
> >      >      >     Liberal Democrat   0.4604567  0.005102666
> >       -0.3764928
> >      >      >           -0.17036682
> >      >      >     Conservative      -0.9514466  0.021912305
> >       -0.5575644
> >      >      >           -0.15838744
> >      >      >                             Blair       Hague    Kennedy
> >          Europe
> >      >      >     political.knowledge
> >      >      >     Labour            0.8371764 -0.90775585  0.2513436
> >     -0.22781308
> >      >      >     -0.5370612
> >      >      >     Liberal Democrat  0.2937331 -0.82217625  0.6710567
> >     -0.20004624
> >      >      >     -0.2034605
> >      >      >     Liberal Democrat -0.5434408  0.08559455  0.4197027
> >     0.02776465
> >      >      >     0.3336068
> >      >      >     Conservative     -0.8371670  0.90778068 -0.2513735
> >     0.22781092
> >      >      >     0.5370545
> >      >      >                         gendermale
> >      >      >     Labour            0.13765774
> >      >      >     Liberal Democrat  0.12640823
> >      >      >     Liberal Democrat -0.01125898
> >      >      >     Conservative     -0.13764849
> >      >      >
> >      >      >       > c(logLik(m), logLik(m1)) # same fit to the data
> >      >      >     [1] -1141.922 -1141.922
> >      >      >
> >      >      >       > # covariance matrices for coefficients:
> >      >      >       > V <- vcov(m)
> >      >      >       > V1 <- vcov(m1)
> >      >      >       > cbind(colnames(V), colnames(V1)) # compare
> >      >      >             [,1]                                       [,2]
> >      >      >
> >      >      >        [1,] "Labour:(Intercept)"
> >       "Liberal
> >      >      >     Democrat:(Intercept)"
> >      >      >        [2,] "Labour:age"
> >       "Liberal
> >      >      >     Democrat:age"
> >      >      >
> >      >      >        [3,] "Labour:economic.cond.national"
> >     "Liberal
> >      >      >     Democrat:economic.cond.national"
> >      >      >        [4,] "Labour:economic.cond.household"
> >       "Liberal
> >      >      >     Democrat:economic.cond.household"
> >      >      >        [5,] "Labour:Blair"
> >       "Liberal
> >      >      >     Democrat:Blair"
> >      >      >        [6,] "Labour:Hague"
> >       "Liberal
> >      >      >     Democrat:Hague"
> >      >      >        [7,] "Labour:Kennedy"
> >       "Liberal
> >      >      >     Democrat:Kennedy"
> >      >      >        [8,] "Labour:Europe"
> >     "Liberal
> >      >      >     Democrat:Europe"
> >      >      >        [9,] "Labour:political.knowledge"
> >       "Liberal
> >      >      >     Democrat:political.knowledge"
> >      >      >     [10,] "Labour:gendermale"
> "Liberal
> >      >      >     Democrat:gendermale"
> >      >      >     [11,] "Liberal Democrat:(Intercept)"
> >      >      >     "Conservative:(Intercept)"
> >      >      >     [12,] "Liberal Democrat:age"
> >      >       "Conservative:age"
> >      >      >
> >      >      >     [13,] "Liberal Democrat:economic.cond.national"
> >      >      >     "Conservative:economic.cond.national"
> >      >      >     [14,] "Liberal Democrat:economic.cond.household"
> >      >      >     "Conservative:economic.cond.household"
> >      >      >     [15,] "Liberal Democrat:Blair"
> >      >       "Conservative:Blair"
> >      >      >
> >      >      >     [16,] "Liberal Democrat:Hague"
> >      >       "Conservative:Hague"
> >      >      >
> >      >      >     [17,] "Liberal Democrat:Kennedy"
> >      >       "Conservative:Kennedy"
> >      >      >
> >      >      >     [18,] "Liberal Democrat:Europe"
> >      >     "Conservative:Europe"
> >      >      >
> >      >      >     [19,] "Liberal Democrat:political.knowledge"
> >      >      >     "Conservative:political.knowledge"
> >      >      >     [20,] "Liberal Democrat:gendermale"
> >      >      >     "Conservative:gendermale"
> >      >      >
> >      >      >       > int <- c(1, 11) # remove intercepts
> >      >      >       > colnames(V)[int]
> >      >      >     [1] "Labour:(Intercept)"           "Liberal
> >     Democrat:(Intercept)"
> >      >      >
> >      >      >       > colnames(V1)[int]
> >      >      >     [1] "Liberal Democrat:(Intercept)"
> >     "Conservative:(Intercept)"
> >      >      >       > V <- V[-int, -int]
> >      >      >       > V1 <- V1[-int, -int]
> >      >      >
> >      >      >       > age <- c(1, 10) # locate age coefficients
> >      >      >       > colnames(V)[age]
> >      >      >     [1] "Labour:age"           "Liberal Democrat:age"
> >      >      >       > colnames(V1)[age]
> >      >      >     [1] "Liberal Democrat:age" "Conservative:age"
> >      >      >
> >      >      >       > V <- cov2cor(V) # compute coefficient correlations
> >      >      >       > V1 <- cov2cor(V1)
> >      >      >
> >      >      >       > # compare GVIFs:
> >      >      >       > c(det(V[age, age])*det(V[-age, -age])/det(V),
> >      >      >     +   det(V1[age, age])*det(V1[-age, -age])/det(V1))
> >      >      >     [1] 1.046232 1.046229
> >      >      >
> >      >      >     --------- snip -----------
> >      >      >
> >      >      >     For curiosity, I applied car::vif() and
> >      >      >     performance::check_collinearity() to these models to
> >     see what
> >      >     they
> >      >      >     would
> >      >      >     do. Both returned the wrong answer. vif() produced a
> >     warning, but
> >      >      >     check_collinearity() didn't:
> >      >      >
> >      >      >     --------- snip -----------
> >      >      >
> >      >      >       > car::vif(m1)
> >      >      >                           age  economic.cond.national
> >      >      >     economic.cond.household
> >      >      >                     15.461045               22.137772
> >      >      >       16.693877
> >      >      >                         Blair                   Hague
> >      >      >       Kennedy
> >      >      >                     14.681562                7.483039
> >      >      >       15.812067
> >      >      >                        Europe     political.knowledge
> >      >      >     gender
> >      >      >                      6.502119                4.219507
> >      >      >     2.313885
> >      >      >     Warning message:
> >      >      >     In vif.default(m1) : No intercept: vifs may not be
> >     sensible.
> >      >      >
> >      >      >       > performance::check_collinearity(m)
> >      >      >     # Check for Multicollinearity
> >      >      >
> >      >      >     Low Correlation
> >      >      >
> >      >      >                           Term  VIF Increased SE Tolerance
> >      >      >                            age 1.72         1.31      0.58
> >      >      >         economic.cond.national 1.85         1.36      0.54
> >      >      >        economic.cond.household 1.86         1.37      0.54
> >      >      >                          Blair 1.63         1.28      0.61
> >      >      >                          Hague 1.94         1.39      0.52
> >      >      >                        Kennedy 1.70         1.30      0.59
> >      >      >                         Europe 2.01         1.42      0.50
> >      >      >            political.knowledge 1.94         1.39      0.52
> >      >      >                         gender 1.78         1.33      0.56
> >      >      >       > performance::check_collinearity(m1)
> >      >      >     # Check for Multicollinearity
> >      >      >
> >      >      >     Low Correlation
> >      >      >
> >      >      >                           Term  VIF Increased SE Tolerance
> >      >      >                            age 1.19         1.09      0.84
> >      >      >         economic.cond.national 1.42         1.19      0.70
> >      >      >        economic.cond.household 1.32         1.15      0.76
> >      >      >                          Blair 1.50         1.22      0.67
> >      >      >                          Hague 1.30         1.14      0.77
> >      >      >                        Kennedy 1.19         1.09      0.84
> >      >      >                         Europe 1.34         1.16      0.75
> >      >      >            political.knowledge 1.30         1.14      0.77
> >      >      >                         gender 1.23         1.11      0.81
> >      >      >
> >      >      >     --------- snip -----------
> >      >      >
> >      >      >     I looked at the code for vif() and
> check_collinearity() to
> >      >     see where
> >      >      >     they went wrong. Both failed to handle the two
> >     intercepts in
> >      >     the model
> >      >      >     correctly -- vif() thought there was no intercept and
> >      >      >     check_collinearity() just removed the first intercept
> >     but not the
> >      >      >     second.
> >      >      >
> >      >      >     In examining the code for check_collinearity(), I
> >     discovered a
> >      >      >     couple of
> >      >      >     additional disconcerting facts. First, part of the
> >     code seems
> >      >     to be
> >      >      >     copied from vif.default(). Second, as a consequence,
> >      >      >     check_collinearity() actually computes GVIFs rather
> >     than VIFs
> >      >     (and
> >      >      >     doesn't reference either the Fox and Monette paper
> >      >     introducing GVIFs or
> >      >      >     the car package) but doesn't seem to understand that,
> and,
> >      >     for example,
> >      >      >     takes the squareroot of the GVIF (reported in the
> >     column marked
> >      >      >     "Increased SE") rather than the 2p root (when there
> >     are p > 1
> >      >      >     coefficients in a term).
> >      >      >
> >      >      >     Here's the relevant code from the two functions (where
> >     . . .
> >      >     denotes
> >      >      >     elided lines) -- the default method for vif() and
> >      >      >     .check_collinearity(),
> >      >      >     which is called by check_collinearity.default():
> >      >      >
> >      >      >     --------- snip -----------
> >      >      >
> >      >      >       > car:::vif.default
> >      >      >     function (mod, ...)
> >      >      >     {
> >      >      >           . . .
> >      >      >           v <- vcov(mod)
> >      >      >           assign <- attr(model.matrix(mod), "assign")
> >      >      >           if (names(coefficients(mod)[1]) ==
> "(Intercept)") {
> >      >      >               v <- v[-1, -1]
> >      >      >               assign <- assign[-1]
> >      >      >           }
> >      >      >           else warning("No intercept: vifs may not be
> >     sensible.")
> >      >      >           terms <- labels(terms(mod))
> >      >      >           n.terms <- length(terms)
> >      >      >           if (n.terms < 2)
> >      >      >               stop("model contains fewer than 2 terms")
> >      >      >           R <- cov2cor(v)
> >      >      >           detR <- det(R)
> >      >      >           . . .
> >      >      >           for (term in 1:n.terms) {
> >      >      >               subs <- which(assign == term)
> >      >      >               result[term, 1] <- det(as.matrix(R[subs,
> >     subs])) *
> >      >      >     det(as.matrix(R[-subs,
> >      >      >                   -subs]))/detR
> >      >      >               result[term, 2] <- length(subs)
> >      >      >           }
> >      >      >           . . .
> >      >      >     }
> >      >      >
> >      >      >       > performance:::.check_collinearity
> >      >      >     function (x, component, verbose = TRUE)
> >      >      >     {
> >      >      >           v <- insight::get_varcov(x, component =
> component,
> >      >     verbose =
> >      >      >     FALSE)
> >      >      >           assign <- .term_assignments(x, component,
> verbose =
> >      >     verbose)
> >      >      >           . . .
> >      >      >           if (insight::has_intercept(x)) {
> >      >      >               v <- v[-1, -1]
> >      >      >               assign <- assign[-1]
> >      >      >           }
> >      >      >           else {
> >      >      >               if (isTRUE(verbose)) {
> >      >      >                   warning("Model has no intercept. VIFs
> >     may not be
> >      >      >     sensible.",
> >      >      >                       call. = FALSE)
> >      >      >               }
> >      >      >           }
> >      >      >               . . .
> >      >      >               terms <- labels(stats::terms(f[[component]]))
> >      >      >               . . .
> >      >      >           n.terms <- length(terms)
> >      >      >           if (n.terms < 2) {
> >      >      >               if (isTRUE(verbose)) {
> >      >      >
>  warning(insight::format_message(sprintf("Not
> >      >     enough model
> >      >      >     terms in the %s part of the model to check for
> >      >     multicollinearity.",
> >      >      >                       component)), call. = FALSE)
> >      >      >               }
> >      >      >               return(NULL)
> >      >      >           }
> >      >      >           R <- stats::cov2cor(v)
> >      >      >           detR <- det(R)
> >      >      >           . . .
> >      >      >           for (term in 1:n.terms) {
> >      >      >               subs <- which(assign == term)
> >      >      >                   . . .
> >      >      >                   result <- c(result,
> >     det(as.matrix(R[subs, subs])) *
> >      >      >                       det(as.matrix(R[-subs, -subs]))/detR)
> >      >      >                   . . .
> >      >      >           }
> >      >      >           . . .
> >      >      >     }
> >      >      >
> >      >      >     --------- snip -----------
> >      >      >
> >      >      >     So, the upshot of all this is that you should be able
> >     to do
> >      >     what you
> >      >      >     want, but not with either car::vif() or
> >      >      >     performance::check_collinearity(). Instead, either
> >     write your own
> >      >      >     function or do the computations in a script.
> >      >      >
> >      >      >     There's also a lesson here about S3 default methods:
> >     The fact
> >      >     that a
> >      >      >     default method returns a result rather than throwing
> >     an error
> >      >     or a
> >      >      >     warning doesn't mean that the result is the right
> answer.
> >      >      >
> >      >      >     I hope this helps,
> >      >      >        John
> >      >      >
> >      >      >
> >      >      >     On 2022-02-26 3:45 p.m., Juho Kristian Ruohonen wrote:
> >      >      >      > Dear John W,
> >      >      >      >
> >      >      >      > Thank you very much for the tip-off! Apologies for
> not
> >      >     responding
> >      >      >     earlier
> >      >      >      > (gmail apparently decided to direct your email
> >     right into the
> >      >      >     junk folder).
> >      >      >      > I am very pleased to note that the package you
> >     mention does
> >      >      >     indeed work
> >      >      >      > with *brms* multinomial models! Thanks again!
> >      >      >      >
> >      >      >      > Best,
> >      >      >      >
> >      >      >      > Juho
> >      >      >      >
> >      >      >      > pe 25. helmik. 2022 klo 19.23 John Willoughby
> >      >      >     (johnwillec using gmail.com <mailto:johnwillec using gmail.com>
> >     <mailto:johnwillec using gmail.com <mailto:johnwillec using gmail.com>>
> >      >     <mailto:johnwillec using gmail.com <mailto:johnwillec using gmail.com>
> >     <mailto:johnwillec using gmail.com <mailto:johnwillec using gmail.com>>>)
> >      >      >      > kirjoitti:
> >      >      >      >
> >      >      >      >> Have you tried the check_collinearity() function
> >     in the
> >      >     performance
> >      >      >      >> package? It's supposed to work on brms models, but
> >     whether it
> >      >      >     will work on
> >      >      >      >> a multinomial model I don't know.  It works well
> >     on mixed
> >      >     models
> >      >      >     generated
> >      >      >      >> by glmmTMB().
> >      >      >      >>
> >      >      >      >> John Willoughby
> >      >      >      >>
> >      >      >      >>
> >      >      >      >> On Fri, Feb 25, 2022 at 3:01 AM
> >      >      >     <r-sig-mixed-models-request using r-project.org
> >     <mailto:r-sig-mixed-models-request using r-project.org>
> >      >     <mailto:r-sig-mixed-models-request using r-project.org
> >     <mailto:r-sig-mixed-models-request using r-project.org>>
> >      >      >     <mailto:r-sig-mixed-models-request using r-project.org
> >     <mailto:r-sig-mixed-models-request using r-project.org>
> >      >     <mailto:r-sig-mixed-models-request using r-project.org
> >     <mailto:r-sig-mixed-models-request using r-project.org>>>>
> >      >      >      >> wrote:
> >      >      >      >>
> >      >      >      >>> Send R-sig-mixed-models mailing list submissions
> to
> >      >      >      >>> r-sig-mixed-models using r-project.org
> >     <mailto:r-sig-mixed-models using r-project.org>
> >      >     <mailto:r-sig-mixed-models using r-project.org
> >     <mailto:r-sig-mixed-models using r-project.org>>
> >      >      >     <mailto:r-sig-mixed-models using r-project.org
> >     <mailto:r-sig-mixed-models using r-project.org>
> >      >     <mailto:r-sig-mixed-models using r-project.org
> >     <mailto:r-sig-mixed-models using r-project.org>>>
> >      >      >      >>>
> >      >      >      >>> To subscribe or unsubscribe via the World Wide
> >     Web, visit
> >      >      >      >>>
> >     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> >      >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>>
> >      >      >
> >       <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> >      >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>>>
> >      >      >      >>> or, via email, send a message with subject or
> >     body 'help' to
> >      >      >      >>> r-sig-mixed-models-request using r-project.org
> >     <mailto:r-sig-mixed-models-request using r-project.org>
> >      >     <mailto:r-sig-mixed-models-request using r-project.org
> >     <mailto:r-sig-mixed-models-request using r-project.org>>
> >      >      >     <mailto:r-sig-mixed-models-request using r-project.org
> >     <mailto:r-sig-mixed-models-request using r-project.org>
> >      >     <mailto:r-sig-mixed-models-request using r-project.org
> >     <mailto:r-sig-mixed-models-request using r-project.org>>>
> >      >      >      >>>
> >      >      >      >>> You can reach the person managing the list at
> >      >      >      >>> r-sig-mixed-models-owner using r-project.org
> >     <mailto:r-sig-mixed-models-owner using r-project.org>
> >      >     <mailto:r-sig-mixed-models-owner using r-project.org
> >     <mailto:r-sig-mixed-models-owner using r-project.org>>
> >      >      >     <mailto:r-sig-mixed-models-owner using r-project.org
> >     <mailto:r-sig-mixed-models-owner using r-project.org>
> >      >     <mailto:r-sig-mixed-models-owner using r-project.org
> >     <mailto:r-sig-mixed-models-owner using r-project.org>>>
> >      >      >      >>>
> >      >      >      >>> When replying, please edit your Subject line so
> it is
> >      >     more specific
> >      >      >      >>> than "Re: Contents of R-sig-mixed-models
> digest..."
> >      >      >      >>>
> >      >      >      >>>
> >      >      >      >>> Today's Topics:
> >      >      >      >>>
> >      >      >      >>>     1. Collinearity diagnostics for (mixed)
> >     multinomial
> >      >     models
> >      >      >      >>>        (Juho Kristian Ruohonen)
> >      >      >      >>>
> >      >      >      >>>
> >      >      >
> >      >
> >
>  ----------------------------------------------------------------------
> >      >      >      >>>
> >      >      >      >>> Message: 1
> >      >      >      >>> Date: Fri, 25 Feb 2022 10:23:25 +0200
> >      >      >      >>> From: Juho Kristian Ruohonen
> >      >     <juho.kristian.ruohonen using gmail.com
> >     <mailto:juho.kristian.ruohonen using gmail.com>
> >      >     <mailto:juho.kristian.ruohonen using gmail.com
> >     <mailto:juho.kristian.ruohonen using gmail.com>>
> >      >      >     <mailto:juho.kristian.ruohonen using gmail.com
> >     <mailto:juho.kristian.ruohonen using gmail.com>
> >      >     <mailto:juho.kristian.ruohonen using gmail.com
> >     <mailto:juho.kristian.ruohonen using gmail.com>>>>
> >      >      >      >>> To: John Fox <jfox using mcmaster.ca
> >     <mailto:jfox using mcmaster.ca> <mailto:jfox using mcmaster.ca
> >     <mailto:jfox using mcmaster.ca>>
> >      >     <mailto:jfox using mcmaster.ca <mailto:jfox using mcmaster.ca>
> >     <mailto:jfox using mcmaster.ca <mailto:jfox using mcmaster.ca>>>>
> >      >      >      >>> Cc: "r-sig-mixed-models using r-project.org
> >     <mailto:r-sig-mixed-models using r-project.org>
> >      >     <mailto:r-sig-mixed-models using r-project.org
> >     <mailto:r-sig-mixed-models using r-project.org>>
> >      >      >     <mailto:r-sig-mixed-models using r-project.org
> >     <mailto:r-sig-mixed-models using r-project.org>
> >      >     <mailto:r-sig-mixed-models using r-project.org
> >     <mailto:r-sig-mixed-models using r-project.org>>>"
> >      >      >      >>>          <r-sig-mixed-models using r-project.org
> >     <mailto:r-sig-mixed-models using r-project.org>
> >      >     <mailto:r-sig-mixed-models using r-project.org
> >     <mailto:r-sig-mixed-models using r-project.org>>
> >      >      >     <mailto:r-sig-mixed-models using r-project.org
> >     <mailto:r-sig-mixed-models using r-project.org>
> >      >     <mailto:r-sig-mixed-models using r-project.org
> >     <mailto:r-sig-mixed-models using r-project.org>>>>
> >      >      >      >>> Subject: [R-sig-ME] Collinearity diagnostics for
> >     (mixed)
> >      >      >     multinomial
> >      >      >      >>>          models
> >      >      >      >>> Message-ID:
> >      >      >      >>>          <
> >      >      >      >>>
> >      >      >
> >      >
> >     CAG_dBVfZr1-P7Q3kbE8TGPm-_2sJixdGCHCtWM9Q9PEnd8ftZw using mail.gmail.com
> >     <mailto:
> CAG_dBVfZr1-P7Q3kbE8TGPm-_2sJixdGCHCtWM9Q9PEnd8ftZw using mail.gmail.com>
> >      >
> >       <mailto:
> CAG_dBVfZr1-P7Q3kbE8TGPm-_2sJixdGCHCtWM9Q9PEnd8ftZw using mail.gmail.com
> <mailto:CAG_dBVfZr1-P7Q3kbE8TGPm-_2sJixdGCHCtWM9Q9PEnd8ftZw using mail.gmail.com
> >>
> >      >      >
> >      >
> >       <mailto:
> CAG_dBVfZr1-P7Q3kbE8TGPm-_2sJixdGCHCtWM9Q9PEnd8ftZw using mail.gmail.com
> <mailto:CAG_dBVfZr1-P7Q3kbE8TGPm-_2sJixdGCHCtWM9Q9PEnd8ftZw using mail.gmail.com>
> <mailto:CAG_dBVfZr1-P7Q3kbE8TGPm-_2sJixdGCHCtWM9Q9PEnd8ftZw using mail.gmail.com
> <mailto:CAG_dBVfZr1-P7Q3kbE8TGPm-_2sJixdGCHCtWM9Q9PEnd8ftZw using mail.gmail.com
> >>>>
> >      >      >      >>> Content-Type: text/plain; charset="utf-8"
> >      >      >      >>>
> >      >      >      >>> Dear John (and anyone else qualified to comment),
> >      >      >      >>>
> >      >      >      >>> I fit lots of mixed-effects multinomial models in
> my
> >      >     research,
> >      >      >     and I
> >      >      >      >> would
> >      >      >      >>> like to see some (multi)collinearity diagnostics
> >     on the
> >      >     fixed
> >      >      >     effects, of
> >      >      >      >>> which there are over 30. My models are fit using
> the
> >      >     Bayesian
> >      >      >     *brms*
> >      >      >      >>> package because I know of no frequentist packages
> >     with
> >      >      >     multinomial GLMM
> >      >      >      >>> compatibility.
> >      >      >      >>>
> >      >      >      >>> With continuous or dichotomous outcomes, my go-to
> >      >     function for
> >      >      >      >> calculating
> >      >      >      >>> multicollinearity diagnostics is of course
> >     *vif()* from
> >      >     the *car*
> >      >      >      >> package.
> >      >      >      >>> As expected, however, this function does not
> >     report sensible
> >      >      >     diagnostics
> >      >      >      >>> for multinomial models -- not even for standard
> >     ones fit
> >      >     by the
> >      >      >     *nnet*
> >      >      >      >>> package's *multinom()* function. The reason, I
> >     presume, is
> >      >      >     because a
> >      >      >      >>> multinomial model is not really one but C-1
> >     regression
> >      >     models
> >      >      >     (where C
> >      >      >      >> is
> >      >      >      >>> the number of response categories) and the *vif()*
> >      >     function is not
> >      >      >      >> designed
> >      >      >      >>> to deal with this scenario.
> >      >      >      >>>
> >      >      >      >>> Therefore, in order to obtain meaningful
> collinearity
> >      >     metrics,
> >      >      >     my present
> >      >      >      >>> plan is to write a simple helper function that
> uses
> >      >     *vif() *to
> >      >      >     calculate
> >      >      >      >>> and present (generalized) variance inflation
> >     metrics for
> >      >     the C-1
> >      >      >      >>> sub-datasets to which the C-1 component binomial
> >     models
> >      >     of the
> >      >      >     overall
> >      >      >      >>> multinomial model are fit. In other words, it
> >     will partition
> >      >      >     the data
> >      >      >      >> into
> >      >      >      >>> those C-1 subsets, and then apply *vif()* to as
> >     many linear
> >      >      >     regressions
> >      >      >      >>> using a made-up continuous response and the fixed
> >     effects of
> >      >      >     interest.
> >      >      >      >>>
> >      >      >      >>> Does this seem like a sensible approach?
> >      >      >      >>>
> >      >      >      >>> Best,
> >      >      >      >>>
> >      >      >      >>> Juho
> >      >      >      >>>
> >      >      >      >>>
> >      >      >      >>>
> >      >      >      >>
> >      >      >      >>          [[alternative HTML version deleted]]
> >      >      >      >>
> >      >      >      >> _______________________________________________
> >      >      >      >> R-sig-mixed-models using r-project.org
> >     <mailto:R-sig-mixed-models using r-project.org>
> >      >     <mailto:R-sig-mixed-models using r-project.org
> >     <mailto:R-sig-mixed-models using r-project.org>>
> >      >      >     <mailto:R-sig-mixed-models using r-project.org
> >     <mailto:R-sig-mixed-models using r-project.org>
> >      >     <mailto:R-sig-mixed-models using r-project.org
> >     <mailto:R-sig-mixed-models using r-project.org>>> mailing list
> >      >      >      >>
> >     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> >      >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>>
> >      >      >
> >       <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> >      >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>>>
> >      >      >      >>
> >      >      >      >
> >      >      >      >       [[alternative HTML version deleted]]
> >      >      >      >
> >      >      >      > _______________________________________________
> >      >      >      > R-sig-mixed-models using r-project.org
> >     <mailto:R-sig-mixed-models using r-project.org>
> >      >     <mailto:R-sig-mixed-models using r-project.org
> >     <mailto:R-sig-mixed-models using r-project.org>>
> >      >      >     <mailto:R-sig-mixed-models using r-project.org
> >     <mailto:R-sig-mixed-models using r-project.org>
> >      >     <mailto:R-sig-mixed-models using r-project.org
> >     <mailto:R-sig-mixed-models using r-project.org>>> mailing list
> >      >      >      >
> >     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> >      >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>>
> >      >      >
> >       <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> >      >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>>>
> >      >      >     --
> >      >      >     John Fox, Professor Emeritus
> >      >      >     McMaster University
> >      >      >     Hamilton, Ontario, Canada
> >      >      >     web: https://socialsciences.mcmaster.ca/jfox/
> >     <https://socialsciences.mcmaster.ca/jfox/>
> >      >     <https://socialsciences.mcmaster.ca/jfox/
> >     <https://socialsciences.mcmaster.ca/jfox/>>
> >      >      >     <https://socialsciences.mcmaster.ca/jfox/
> >     <https://socialsciences.mcmaster.ca/jfox/>
> >      >     <https://socialsciences.mcmaster.ca/jfox/
> >     <https://socialsciences.mcmaster.ca/jfox/>>>
> >      >      >
> >      >     --
> >      >     John Fox, Professor Emeritus
> >      >     McMaster University
> >      >     Hamilton, Ontario, Canada
> >      >     web: https://socialsciences.mcmaster.ca/jfox/
> >     <https://socialsciences.mcmaster.ca/jfox/>
> >      >     <https://socialsciences.mcmaster.ca/jfox/
> >     <https://socialsciences.mcmaster.ca/jfox/>>
> >      >
> >     --
> >     John Fox, Professor Emeritus
> >     McMaster University
> >     Hamilton, Ontario, Canada
> >     web: https://socialsciences.mcmaster.ca/jfox/
> >     <https://socialsciences.mcmaster.ca/jfox/>
> >
> ------------------------------------------------------------------------
> --
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://socialsciences.mcmaster.ca/jfox/
>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list