[R] Choosing subset of data
David Winsemius
dwinsemius at comcast.net
Sat May 21 03:35:44 CEST 2016
> On May 20, 2016, at 11:46 AM, oslo via R-help <r-help at r-project.org> wrote:
>
> Hi all;
> I have a big data set (a small part is given below) and V1 column has repeated info in it. That is rs941873, rs12307687... are repeating many times. I need choose only one SNP (in first column named rs) which has the smallest Pvalue withing V1 column. That is I need choose only one SNP for repeated names in V1 which has the smallest Pvalue.
> Your helps are truly appreciated,
>
>
>
>
> rs Chr V6 A1 A2 Freq Effect StdErr Pvalue V1 Gene rs941873 chr10 81139462 a g 0.4117 -0.0541 0.0103 1.52E-07 rs941873 no_value rs634552 chr11 75282052 t g 0.3735 0.0159 0.0099 1.08E-01 rs941873 SERPINH1 rs11107175 chr12 94161719 t c 0.0896 -0.0386 0.0176 2.85E-02 rs941873 CRADD rs12307687 chr12 47175866 a t 0.7379 -0.0208 0.0135 1.23E-01 rs12307687 SLC38A4 rs3917155 chr14 76444685 c g 0.0495 0.0153 0.0371 6.80E-01 rs941873 TGFB3 rs1600640 chr15 84603034 t g 0.1791 -0.0448 0.0123 2.75E-04 rs12307687 ADAMTSL3 rs2871865 chr15 99194896 c g 0.5515 0.0191 0.0106 7.09E-02 rs12307687 IGF1R rs2955250 chr17 61959740 t c 0.6945 0.0277 0.0129 3.17E-02 rs12307687 GH2 rs228758 chr17 42148205 t c 0.1222 -0.0265 0.015 7.72E-02 rs12307687 G6PC3 rs224333 chr20 34023962 a g 0.8606 0.0568 0.0246 2.10E-02 rs10071837 GDF5 rs4681725 chr3 56692321 t g 0.2362 0.0386 0.011 4.45E-04 rs10071837 C3orf63 rs7652177 chr3 171969077 c g 0.1478 -0.0458 0.0134 6.34E-04 rs10071837 FNDC3B rs925098 chr4 17919811 a g 0.6529 -0.0563 0.0097 5.55E-09 rs925098 LCORL rs1662837 chr4 82168889 t c 0.2728 -0.0411 0.0105 8.66E-05 rs925098 no_value rs10071837 chr5 33381581 t c 0.424 -0.0324 0.0094 5.74E-04 rs925098 no_value
>
> [[alternative HTML version deleted]]
The reason your data is garbled is that you failed to configure your email client to post in plain text. Please read the posting guide and the listinfo for r-help. You should also clarify what you want. Do you want only one line per SNP so the result would be another dataframe with a reduced number of lines of data?
______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list