[R] Condition indexes and variance inflation factors

John Fox jfox at mcmaster.ca
Wed Jul 23 21:52:54 CEST 2003


Dear Peter and Uwe,

I don't have a copy of Belsley's 1991 book here, but I do have Belsley, 
Kuh, and Welsch, Regression Diagnostics (Wiley, 1980). If my memory is 
right, the approach is the same: Belsley's collinearity diagnostics are 
based on a singular-value decomposition of the scaled but uncentred model 
matrix. A straightforward, if inelegant, rendition is

belsley <- function(model){
     X <- model.matrix(model)
     X <- scale(X, center=FALSE)/sqrt(nrow(X) - 1)
     svd.X <- svd(X)
     result <- list(singular.values = svd.X$d, condition.indices = 
max(svd.X$d)/svd.X$d)
     phi <- sweep(svd.X$v^2, 2, svd.X$d^2, "/")
     Pi <- t(sweep(phi, 1, rowSums(phi), "/"))
     colnames(Pi) <- names(coef(model))
     rownames(Pi) <- 1:nrow(Pi)
     result$pi <- Pi
     class(result) <- "belsley"
     result
     }

print.belsley <- function(x, digits = 3, ...){
     cat("\nSingular values: ", x$singular.values)
     cat("\nCondition indices: ", x$condition.indices)
     cat("\n\nVariance-decomposition proportions\n")
     print(round(x$pi, digits))
     invisible(x)
     }

This gives the singular values, condition indices, and 
variance-decomposition proportions. (I'm pretty sure that you can get the 
same thing more elegantly from the qr decomposition, but I don't know how 
off the top of my head -- someone else on the list doubtless can supply the 
details.)

For example, for the illustration on p. 161 of BKW,

 > X
    V1  V2  V3     V4     V5
1 -74  80  18    -56   -112
2  14 -69  21     52    104
3  66 -72  -5    764   1528
4 -12  66 -30   4096   8192
5   3   8  -7 -13276 -26552
6   4 -12   4   8421  16842
 > mod <- lm(y ~ X - 1)  # nb., y was just randomly generated
 > belsley(mod)

Singular values:  1.414214 1.361734 1.066707 0.08840437 3.614479e-17
Condition indices:  1 1.038538 1.325775 15.9971 3.912635e+16

Variance-decomposition proportions
     XV1   XV2   XV3 XV4 XV5
1 0.000 0.000 0.000   0   0
2 0.005 0.005 0.000   0   0
3 0.001 0.001 0.047   0   0
4 0.994 0.994 0.953   0   0
5 0.000 0.000 0.000   1   1

which is in good agreement with the values given in the text.

Now some comments:

(1) I've never liked this approach for a model with a constant, where it 
makes more sense to me to centre the data. I realize that opinions differ 
here, but it seems to me that failing to centre the data conflates 
collinearity with numerical instability.

(2) I also disagree with the comment that condition indices are easier to 
interpret than variance-inflation factors. In either case, since 
collinearity is a continuous phenomenon, cutoffs for large values are 
necessarily arbitrary.

(3) If you're interested in figuring out which variables are involved in 
each collinear relationship, then (for centred and scaled data) you can 
equivalently (and to me, more intuitively) work with the 
principal-components analysis of the predictors.

(4) I have doubts about the whole enterprise. Collinearity is one source of 
imprecision -- others are small sample size, homogeneous predictors, and 
large error variance. Aren't the coefficient standard errors the bottom 
line? If these are sufficiently small, why worry?

I hope that this helps.

John

At 05:35 PM 7/23/2003 +0200, Uwe Ligges wrote:
>Peter Flom wrote:
>
>>Has anyone programmed condition indexes in R?
>>I know that there is a function for variance inflation factors
>>available in the car package; however, Belsley (1991) Conditioning
>>Diagnostics (Wiley) notes that there are several weaknesses of VIFs:
>>e.g. 1) High VIFs are sufficient but not necessary conditions for
>>collinearity  2) VIFs don't diagnose the number of collinearities and 3)
>>No one has determined how high a VIF has to be for the collinearity to
>>be damaging.
>>He then develops and suggests using condition indexes instead, so I was
>>wondering if anyone had programmed them.
>>Thanks
>>Peter
>
>
>I think Juergen Gross has something like that in his new book
>Gross, J. (2003): Linear Regression, Springer (in press - OK, not very 
>helpful here).
>
>You might want to contact him privately (in CC).
>
>Uwe Ligges
>

-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox




More information about the R-help mailing list