[R] Which columns give rise to linear dependency?
John Fox
jfox at mcmaster.ca
Tue Nov 5 16:03:12 CET 2002
Dear Michael,
There are several ways of finding near dependencies. For example, Belsley,
Kuh, and Welsch in Regression Diagnostics (1980) use the singular-value
decomposition. Here are a couple of simple approaches:
(1) Use the principal-component analysis of the standardized X-matrix. Very
small component variances correspond to near collinearities, and the
corresponding principal-component coefficients give you linear combination
of the standardized x's nearly equal to 0.
(2) Look at the variance-inflation factors. Very large VIFs correspond to
variables that are nearly linearly dependent on others; regress each such
variable on the others to see what the dependencies are. (Some of these
regressions will be redundant.)
I hope that this helps,
John
At 12:24 PM 11/5/2002 +0000, Michael Dewey wrote:
>Short version
>
>If I have a data frame X and I suspect
>that there is a dependency between
>the columns how do I confirm that,
>and how do I tell which subset of columns
>is involved?
>
>==================================
>
>Long version
>
>A colleague had been trying to use
>the SPSS RELIABILITY procedure.
>It told her that the determinant of the
>matrix was small. She asked me what that meant
>and I told her that one of her variables was a
>linear combination of others.
>I agreed to investigate further and imported
>the datasets into R. The rows of each X represent
>people, and the columns items. The x_{ij} are binary (coded
>0/1). Three of the datasets gave the
>error message from SPSS. I confirmed that
>the matrix involved was indeed var(X)
>and that det(var(X)) agreed with SPSS.
>What I thought was that I would find
>that the smallest eigenvalues would
>be zero, but in two of the datasets that was not true.
>In the third dataset I traced the problem quickly
>to a pair of items which were
>perfectly correlated.
>
>1 I suspect that det(var(X)) is a poor test of
> whether X is of reduced rank. I have also looked at kappa(X)
> which gives values of 10 and 17 for the two offending scales,
> but I have no feel for whether that is high (bad?).
>2 I thought that by doing svd(X) and then
> examining V I could answer my problem.
> However the elements of V, specifically
> the last column, did not show what I
> hoped: most values effectively
> zero and the rest adding to zero.
> This did work for the third dataset though.
>3 I think that SPSS was trying to invert
> var(X) in order to compute the multiple
> correlation of each item with the others.
> Is there any neat way of doing that in R?
>
>I am using 1.5.1 on Windows 98 if that makes
>a difference.
>
>If anyone wants to look at one of the datasets
>I have her permission to make it available.
>Point your browser at http://www.aghmed.fsnet.co.uk/r.html
>
>
>Michael Dewey
>michael.dewey at nottingham.ac.uk
>http://www.nottingham.ac.uk/~mhzmd/home.html
>
>
>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
-----------------------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list