[R] Choosing subset of data

oslo hokut1 at yahoo.com
Fri May 20 20:46:31 CEST 2016


Hi all;
I have a big data set (a small part is given below) and V1 column has repeated info in it. That is rs941873, rs12307687... are repeating many times. I need choose only one SNP (in first column named rs) which has the smallest  Pvalue withing V1 column. That is I need choose only one SNP for repeated names in V1 which has the smallest Pvalue.
Your helps are truly appreciated,




 rs                   Chr V6            A1  A2   Freq   Effect  StdErr         Pvalue  V1          Gene rs941873 chr10 81139462 a g 0.4117 -0.0541 0.0103 1.52E-07 rs941873        no_value rs634552 chr11 75282052 t g 0.3735 0.0159 0.0099 1.08E-01 rs941873 SERPINH1 rs11107175 chr12 94161719 t c 0.0896 -0.0386 0.0176 2.85E-02 rs941873  CRADD rs12307687 chr12 47175866 a t 0.7379 -0.0208 0.0135 1.23E-01 rs12307687 SLC38A4 rs3917155 chr14 76444685 c g 0.0495 0.0153 0.0371 6.80E-01 rs941873  TGFB3 rs1600640 chr15 84603034 t g 0.1791 -0.0448 0.0123 2.75E-04 rs12307687 ADAMTSL3 rs2871865 chr15 99194896 c g 0.5515 0.0191 0.0106 7.09E-02 rs12307687 IGF1R rs2955250 chr17 61959740 t c 0.6945 0.0277 0.0129 3.17E-02 rs12307687 GH2 rs228758 chr17 42148205 t c 0.1222 -0.0265 0.015 7.72E-02 rs12307687 G6PC3 rs224333 chr20 34023962 a g 0.8606 0.0568 0.0246 2.10E-02 rs10071837 GDF5 rs4681725 chr3 56692321 t g 0.2362 0.0386 0.011 4.45E-04 rs10071837 C3orf63 rs7652177 chr3   171969077 c g 0.1478 -0.0458 0.0134 6.34E-04 rs10071837 FNDC3B rs925098 chr4   17919811 a g 0.6529 -0.0563 0.0097 5.55E-09 rs925098 LCORL rs1662837 chr4  82168889 t c 0.2728 -0.0411 0.0105 8.66E-05 rs925098  no_value rs10071837 chr5  33381581 t c 0.424 -0.0324 0.0094 5.74E-04 rs925098  no_value

	[[alternative HTML version deleted]]



More information about the R-help mailing list