[R] Remove highly correlated variables from a data frame or matrix

Ana Marija @okov|c@@n@m@r|j@ @end|ng |rom gm@||@com
Thu Nov 14 21:11:55 CET 2019


I don't understand. I have to keep only pairs of variables with
correlation less than 0.8 in order to proceed with some calculations

On Thu, Nov 14, 2019 at 2:09 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:
>
> Obvious advice:
>
> DON'T DO THIS!
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Thu, Nov 14, 2019 at 10:50 AM Ana Marija <sokovic.anamarija using gmail.com> wrote:
>>
>> Hello,
>>
>> I have a data frame like this (a matrix):
>> head(calc.rho)
>>             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
>> rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
>> rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
>> rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
>> rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
>> rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
>> rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
>>
>> > dim(calc.rho)
>> [1] 246 246
>>
>> I would like to remove from this data all highly correlated variables,
>> with correlation more than 0.8
>>
>> I tried this:
>>
>> > data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
>> > dim(data)
>> [1] 246   0
>>
>> Can you please advise,
>>
>> Thanks
>> Ana
>>
>> But this removes everything.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list