[R] Choosing subset of data

David Winsemius dwinsemius at comcast.net
Sat May 21 03:35:44 CEST 2016


> On May 20, 2016, at 11:46 AM, oslo via R-help <r-help at r-project.org> wrote:
> 
> Hi all;
> I have a big data set (a small part is given below) and V1 column has repeated info in it. That is rs941873, rs12307687... are repeating many times. I need choose only one SNP (in first column named rs) which has the smallest  Pvalue withing V1 column. That is I need choose only one SNP for repeated names in V1 which has the smallest Pvalue.
> Your helps are truly appreciated,
> 
> 
> 
> 
>  rs                   Chr V6            A1  A2   Freq   Effect  StdErr         Pvalue  V1          Gene rs941873 chr10 81139462 a g 0.4117 -0.0541 0.0103 1.52E-07 rs941873        no_value rs634552 chr11 75282052 t g 0.3735 0.0159 0.0099 1.08E-01 rs941873 SERPINH1 rs11107175 chr12 94161719 t c 0.0896 -0.0386 0.0176 2.85E-02 rs941873  CRADD rs12307687 chr12 47175866 a t 0.7379 -0.0208 0.0135 1.23E-01 rs12307687 SLC38A4 rs3917155 chr14 76444685 c g 0.0495 0.0153 0.0371 6.80E-01 rs941873  TGFB3 rs1600640 chr15 84603034 t g 0.1791 -0.0448 0.0123 2.75E-04 rs12307687 ADAMTSL3 rs2871865 chr15 99194896 c g 0.5515 0.0191 0.0106 7.09E-02 rs12307687 IGF1R rs2955250 chr17 61959740 t c 0.6945 0.0277 0.0129 3.17E-02 rs12307687 GH2 rs228758 chr17 42148205 t c 0.1222 -0.0265 0.015 7.72E-02 rs12307687 G6PC3 rs224333 chr20 34023962 a g 0.8606 0.0568 0.0246 2.10E-02 rs10071837 GDF5 rs4681725 chr3 56692321 t g 0.2362 0.0386 0.011 4.45E-04 rs10071837 C3orf63 rs7652177 chr3   171969077 c g 0.1478 -0.0458 0.0134 6.34E-04 rs10071837 FNDC3B rs925098 chr4   17919811 a g 0.6529 -0.0563 0.0097 5.55E-09 rs925098 LCORL rs1662837 chr4  82168889 t c 0.2728 -0.0411 0.0105 8.66E-05 rs925098  no_value rs10071837 chr5  33381581 t c 0.424 -0.0324 0.0094 5.74E-04 rs925098  no_value
> 
> 	[[alternative HTML version deleted]]

The reason your data is garbled is that you failed to configure your email client to post in plain text. Please read the posting guide and the listinfo for r-help. You should also clarify what you want. Do you want only one line per SNP so the result would be another dataframe with a reduced number of lines of data?

______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list