[R] Remove highly correlated variables from a data frame or matrix

Abby Spurdle @purd|e@@ @end|ng |rom gm@||@com
Thu Nov 14 21:29:39 CET 2019


Sorry, but I don't understand your question.

When I first looked at this, I thought it was a correlation (or
covariance) matrix.
e.g.

> cor (quakes)
> cov (quakes)

However, your  row and column variables are different, implying two
different data sets.
Also, some of the (correlation?) coefficients are the same, implying
that some of the variables are the same, or very close.

Also, note that a matrix is not a data.frame.


> I have a data frame like this (a matrix):
> head(calc.rho)
>             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
> > dim(calc.rho)
> [1] 246 246
> I would like to remove from this data all highly correlated variables,
> with correlation more than 0.8



More information about the R-help mailing list