[R] Choosing and preserving a random duplicate

Wed Mar 31 01:33:24 CEST 2010

Dear R-Helpers,

I have a dataframe (g10df) formatted like this:

    GENE             PVAL
1 KCTD12      4.06904e-22
2 UNC93A      9.91852e-22
3  CDKN3      1.24695e-21
4 CLEC2B      4.71759e-21
5   DAB2      1.12062e-20

The rows are ranked in ascending order by PVAL, and I need to end up with
the same relative order. There are duplicate entries for genes in the first
column with corresponding p-values in the second, but the p-values are
unique. I had intended to use the plyr package to remove these duplicates:

ddply(g10df, "GENE", summarise, PVAL = mean(PVAL))

But it occurred to me that instead of averaging the p-values for each set of
duplicates, I should instead select one duplicate at random, and remove the
rest. 

I am relatively new to R, and I have not been able to find a way to do this,
with plyr or otherwise. Any help would be greatly appreciated.

Thanks and best regards,

Jeff

-- 
View this message in context: http://n4.nabble.com/Choosing-and-preserving-a-random-duplicate-tp1746091p1746091.html
Sent from the R help mailing list archive at Nabble.com.