[R] Condition indexes and variance inflation factors
John Fox
jfox at mcmaster.ca
Wed Jul 23 21:52:54 CEST 2003
Dear Peter and Uwe,
I don't have a copy of Belsley's 1991 book here, but I do have Belsley,
Kuh, and Welsch, Regression Diagnostics (Wiley, 1980). If my memory is
right, the approach is the same: Belsley's collinearity diagnostics are
based on a singular-value decomposition of the scaled but uncentred model
matrix. A straightforward, if inelegant, rendition is
belsley <- function(model){
X <- model.matrix(model)
X <- scale(X, center=FALSE)/sqrt(nrow(X) - 1)
svd.X <- svd(X)
result <- list(singular.values = svd.X$d, condition.indices =
max(svd.X$d)/svd.X$d)
phi <- sweep(svd.X$v^2, 2, svd.X$d^2, "/")
Pi <- t(sweep(phi, 1, rowSums(phi), "/"))
colnames(Pi) <- names(coef(model))
rownames(Pi) <- 1:nrow(Pi)
result$pi <- Pi
class(result) <- "belsley"
result
}
print.belsley <- function(x, digits = 3, ...){
cat("\nSingular values: ", x$singular.values)
cat("\nCondition indices: ", x$condition.indices)
cat("\n\nVariance-decomposition proportions\n")
print(round(x$pi, digits))
invisible(x)
}
This gives the singular values, condition indices, and
variance-decomposition proportions. (I'm pretty sure that you can get the
same thing more elegantly from the qr decomposition, but I don't know how
off the top of my head -- someone else on the list doubtless can supply the
details.)
For example, for the illustration on p. 161 of BKW,
> X
V1 V2 V3 V4 V5
1 -74 80 18 -56 -112
2 14 -69 21 52 104
3 66 -72 -5 764 1528
4 -12 66 -30 4096 8192
5 3 8 -7 -13276 -26552
6 4 -12 4 8421 16842
> mod <- lm(y ~ X - 1) # nb., y was just randomly generated
> belsley(mod)
Singular values: 1.414214 1.361734 1.066707 0.08840437 3.614479e-17
Condition indices: 1 1.038538 1.325775 15.9971 3.912635e+16
Variance-decomposition proportions
XV1 XV2 XV3 XV4 XV5
1 0.000 0.000 0.000 0 0
2 0.005 0.005 0.000 0 0
3 0.001 0.001 0.047 0 0
4 0.994 0.994 0.953 0 0
5 0.000 0.000 0.000 1 1
which is in good agreement with the values given in the text.
Now some comments:
(1) I've never liked this approach for a model with a constant, where it
makes more sense to me to centre the data. I realize that opinions differ
here, but it seems to me that failing to centre the data conflates
collinearity with numerical instability.
(2) I also disagree with the comment that condition indices are easier to
interpret than variance-inflation factors. In either case, since
collinearity is a continuous phenomenon, cutoffs for large values are
necessarily arbitrary.
(3) If you're interested in figuring out which variables are involved in
each collinear relationship, then (for centred and scaled data) you can
equivalently (and to me, more intuitively) work with the
principal-components analysis of the predictors.
(4) I have doubts about the whole enterprise. Collinearity is one source of
imprecision -- others are small sample size, homogeneous predictors, and
large error variance. Aren't the coefficient standard errors the bottom
line? If these are sufficiently small, why worry?
I hope that this helps.
John
At 05:35 PM 7/23/2003 +0200, Uwe Ligges wrote:
>Peter Flom wrote:
>
>>Has anyone programmed condition indexes in R?
>>I know that there is a function for variance inflation factors
>>available in the car package; however, Belsley (1991) Conditioning
>>Diagnostics (Wiley) notes that there are several weaknesses of VIFs:
>>e.g. 1) High VIFs are sufficient but not necessary conditions for
>>collinearity 2) VIFs don't diagnose the number of collinearities and 3)
>>No one has determined how high a VIF has to be for the collinearity to
>>be damaging.
>>He then develops and suggests using condition indexes instead, so I was
>>wondering if anyone had programmed them.
>>Thanks
>>Peter
>
>
>I think Juergen Gross has something like that in his new book
>Gross, J. (2003): Linear Regression, Springer (in press - OK, not very
>helpful here).
>
>You might want to contact him privately (in CC).
>
>Uwe Ligges
>
-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
More information about the R-help
mailing list