[R] Condition indexes and variance inflation factors
Peter Flom
flom at ndri.org
Thu Jul 24 14:24:35 CEST 2003
Thanks for all the help.
Juergen Gross supplied a program which does just what Belsley
suggested.
Chuck Cleland, John Fox and Andy Liaw all made useful programming
suggestions.
John Fox asked
<<<
(1) I've never liked this approach for a model with a constant, where
it
makes more sense to me to centre the data. I realize that opinions
differ
here, but it seems to me that failing to centre the data conflates
collinearity with numerical instability.
>>>
Opinions do differ. A few years ago, I could have given more details
(my dissertation was on this topic, but a lot of the details have
disappeared from memory); I think, though, that Belsley is looking for a
measure that deals not only with collinearity, but with several other
problems, including numerical instability (the subtitle of his later
book is Collinearity and Weak Data in Regression). I remember being
convinced that centering was generally not a good idea, but there are
lots of people who disagree and who know a lot more statistics than I
do.
<<<
(2) I also disagree with the comment that condition indices are easier
to
interpret than variance-inflation factors. In either case, since
collinearity is a continuous phenomenon, cutoffs for large values are
necessarily arbitrary.
>>>
While any cutoff is arbitrary (and Belsley advises against using a
cutoff rigidly) he does provide some evidence of how regression models
with different condition indices are affected by them.
<<<
(3) If you're interested in figuring out which variables are involved
in
each collinear relationship, then (for centred and scaled data) you can
equivalently (and to me, more intuitively) work with the
principal-components analysis of the predictors.
>>>
This would also work.
<<<
(4) I have doubts about the whole enterprise. Collinearity is one
source of
imprecision -- others are small sample size, homogeneous predictors,
and
large error variance. Aren't the coefficient standard errors the bottom
line? If these are sufficiently small, why worry?
>>>
I think (correct me if I am wrong) that the s.e.s and the condition
indices serve very different purposes. The condition indices are
supposed to determine if small changes in the input data could make big
differences in the results. Belsley provides some examples where a tiny
change in the data results in completely different results (e.g.,
different standard errors, different coefficients (even reversing sign)
and so on).
Peter
Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)
More information about the R-help
mailing list