[R] Choosing subset of data
oslo
hokut1 at yahoo.com
Fri May 20 20:46:31 CEST 2016
Hi all;
I have a big data set (a small part is given below) and V1 column has repeated info in it. That is rs941873, rs12307687... are repeating many times. I need choose only one SNP (in first column named rs) which has the smallest Pvalue withing V1 column. That is I need choose only one SNP for repeated names in V1 which has the smallest Pvalue.
Your helps are truly appreciated,
rs Chr V6 A1 A2 Freq Effect StdErr Pvalue V1 Gene rs941873 chr10 81139462 a g 0.4117 -0.0541 0.0103 1.52E-07 rs941873 no_value rs634552 chr11 75282052 t g 0.3735 0.0159 0.0099 1.08E-01 rs941873 SERPINH1 rs11107175 chr12 94161719 t c 0.0896 -0.0386 0.0176 2.85E-02 rs941873 CRADD rs12307687 chr12 47175866 a t 0.7379 -0.0208 0.0135 1.23E-01 rs12307687 SLC38A4 rs3917155 chr14 76444685 c g 0.0495 0.0153 0.0371 6.80E-01 rs941873 TGFB3 rs1600640 chr15 84603034 t g 0.1791 -0.0448 0.0123 2.75E-04 rs12307687 ADAMTSL3 rs2871865 chr15 99194896 c g 0.5515 0.0191 0.0106 7.09E-02 rs12307687 IGF1R rs2955250 chr17 61959740 t c 0.6945 0.0277 0.0129 3.17E-02 rs12307687 GH2 rs228758 chr17 42148205 t c 0.1222 -0.0265 0.015 7.72E-02 rs12307687 G6PC3 rs224333 chr20 34023962 a g 0.8606 0.0568 0.0246 2.10E-02 rs10071837 GDF5 rs4681725 chr3 56692321 t g 0.2362 0.0386 0.011 4.45E-04 rs10071837 C3orf63 rs7652177 chr3 171969077 c g 0.1478 -0.0458 0.0134 6.34E-04 rs10071837 FNDC3B rs925098 chr4 17919811 a g 0.6529 -0.0563 0.0097 5.55E-09 rs925098 LCORL rs1662837 chr4 82168889 t c 0.2728 -0.0411 0.0105 8.66E-05 rs925098 no_value rs10071837 chr5 33381581 t c 0.424 -0.0324 0.0094 5.74E-04 rs925098 no_value
[[alternative HTML version deleted]]
More information about the R-help
mailing list