[R-meta] Testing multicollinearity between categorical predictors

Fri Jun 19 17:52:47 CEST 2020

I don't have a reference, but one doesn't need one for this anyway. A factor with two levels is just a dummy variable. That is computationally indistinguishable from a "continuous" predictor that just happens to take on the values 0 and 1. So, the VIFs will be the same whether we regard this as a factor or as a continuous variable. We can also just examine this by example:

library(metafor)

dat <- dat.bcg
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat)

dat$random <- ifelse(dat$alloc == "random", 1, 0)
dat$far    <- ifelse(dat$ablat >= 35, 1, 0)

res <- rma(yi, vi, mods = ~ random*far, data=dat)
res
vif(res)

res <- rma(yi, vi, mods = ~ factor(random)*factor(far), data=dat)
res
vif(res)

Now the more interesting aspect here is that we don't actually have to use 0/1 coding for the factor. We could, for example, also use +-1 coding. This won't change the significance of the interaction term, although it does change the meaning of the "main effects":

dat$random <- ifelse(dat$alloc == "random", 1, -1)
dat$far    <- ifelse(dat$ablat >= 35, 1, -1)
res <- rma(yi, vi, mods = ~ random*far, data=dat)
res

However, this coding can reduce the VIFs quite a bit:

vif(res)

because the correlation between the variables is much lower now:

with(dat, cor(cbind(random, far, random*far)))

But this also shows that the usefulness of VIFs is questionnable, especially for interaction terms. Again, the significance of the interaction term is the same and it would be regardless of how high the VIF is even with 0/1 coding.

Best,
Wolfgang

>-----Original Message-----
>From: Rafael Rios [mailto:biorafaelrm using gmail.com]
>Sent: Friday, 19 June, 2020 17:33
>To: Viechtbauer, Wolfgang (SP)
>Cc: r-sig-meta-analysis using r-project.org
>Subject: Re: [R-meta] Testing multicollinearity between categorical
>predictors
>
>Thank you very much, Wolfgang. Do you have a reference supporting this
>approach? It will be very helpful.
>
>Best wishes,
>
>Rafael.
>
>Em sex., 19 de jun. de 2020 às 12:03, Viechtbauer, Wolfgang (SP)
><wolfgang.viechtbauer using maastrichtuniversity.nl> escreveu:
>In that case, you can just use vif(). The 'generalized VIF' is only relevant
>when a factor variable has more than two levels and one wants to compute a
>VIF that pertains to the whole factor, not just each of the individual dummy
>variable. But if the factor only has two levels, then there is only one
>dummy variable, so this is the same as GVIF.
>
>Best,
>Wolfgang
>
>>-----Original Message-----
>>From: Rafael Rios [mailto:biorafaelrm using gmail.com]
>>Sent: Friday, 19 June, 2020 16:35
>>To: Viechtbauer, Wolfgang (SP)
>>Cc: r-sig-meta-analysis using r-project.org
>>Subject: Re: [R-meta] Testing multicollinearity between categorical
>>predictors
>>
>>Dear Wolfgang,
>>
>>Yes, it is. Yes and no for each moderator. I am also evaluating their
>>interaction.
>>
>>All the best,
>>
>>Rafael.
>>
>>Em sex., 19 de jun. de 2020 às 10:12, Viechtbauer, Wolfgang (SP)
>><wolfgang.viechtbauer using maastrichtuniversity.nl> escreveu:
>>I am not sure I fully understand. Are you saying that the two moderators
>>have two levels each?
>>
>>Best,
>>Wolfgang
>>
>>>-----Original Message-----
>>>From: Rafael Rios [mailto:biorafaelrm using gmail.com]
>>>Sent: Friday, 19 June, 2020 15:02
>>>To: Michael Dewey
>>>Cc: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis using r-project.org
>>>Subject: Re: [R-meta] Testing multicollinearity between categorical
>>>predictors
>>>
>>>Dear Michael,
>>>
>>>Thank you for the reply. I am evaluating the biases arising from pooling
>>>samples from different populations and periods on the average effect size.
>>>Therefore, I included both pooling practices, and their interaction as
>>>moderators. Each practice has two levels (yes and no).
>>>
>>>Best wishes,
>>>
>>>Rafael.
>>>
>>>Em sex., 19 de jun. de 2020 às 09:50, Michael Dewey
>>><lists using dewey.myzen.co.uk> escreveu:
>>>Dear Rafael
>>>
>>>It is hard to answer here because we do not know what scientific problem
>>>the referee thinks he or she has spotted which would be solved by such a
>>>test. Being of a cynical world view I suspect neither does the referee
>>>and this is a conditioned reflex like Pavlov's dog salivating at the bell.
>>>
>>>Are the two moderators of scientific interest to you or are you
>>>including them so you can say that there is still residual heterogeneity
>>>even after you did your best to explain it? In the latter case I would
>>>suggest collinearity is irrelevant.
>>>
>>>Michael
>>>
>>>On 19/06/2020 13:36, Rafael Rios wrote:
>>>> Dear Wolfgang,
>>>>
>>>> Thank you for the replay. I also thought about using VIF to evaluate
>>>> multicollinearity, but there is a lot of criticism about the applicability
>>>> of VIF for categorical predictors. There is a variation called GVIF.
>>>> However, since the meta-analysis changes categorical predictors to dummy
>>>> variables, I could not use it in R. I am not sure whether this is the best
>>>> approach. Do you not other methods to evaluate or avoid potential
>>>> multicollinearity among categorical moderators?
>>>>
>>>> Best wishes,
>>>>
>>>> Rafael.
>>>>
>>>> Em sex., 19 de jun. de 2020 às 05:40, Viechtbauer, Wolfgang (SP) <
>>>> wolfgang.viechtbauer using maastrichtuniversity.nl> escreveu:
>>>>
>>>>> Dear Rafael,
>>>>>
>>>>> I don't know what "testing" for multicollinearity would entail. One could
>>>>> examine the variance inflation factors with vif(). What VIF values are
>>>>> considered "large" is debatable though.
>>>>>
>>>>> Best,
>>>>> Wolfgang