[R] Remove highly correlated variables from a data frame or matrix

Ana Marija @okov|c@@n@m@r|j@ @end|ng |rom gm@||@com
Thu Nov 14 19:50:45 CET 2019


Hello,

I have a data frame like this (a matrix):
head(calc.rho)
            rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
rs35350506      0.975     0.309     0.371     0.371     0.371     0.638

> dim(calc.rho)
[1] 246 246

I would like to remove from this data all highly correlated variables,
with correlation more than 0.8

I tried this:

> data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
> dim(data)
[1] 246   0

Can you please advise,

Thanks
Ana

But this removes everything.



More information about the R-help mailing list