[R] Choosing and preserving a random duplicate
jeff.m.ewers
jeff.m.ewers at vanderbilt.edu
Wed Mar 31 01:33:24 CEST 2010
Dear R-Helpers,
I have a dataframe (g10df) formatted like this:
GENE PVAL
1 KCTD12 4.06904e-22
2 UNC93A 9.91852e-22
3 CDKN3 1.24695e-21
4 CLEC2B 4.71759e-21
5 DAB2 1.12062e-20
The rows are ranked in ascending order by PVAL, and I need to end up with
the same relative order. There are duplicate entries for genes in the first
column with corresponding p-values in the second, but the p-values are
unique. I had intended to use the plyr package to remove these duplicates:
ddply(g10df, "GENE", summarise, PVAL = mean(PVAL))
But it occurred to me that instead of averaging the p-values for each set of
duplicates, I should instead select one duplicate at random, and remove the
rest.
I am relatively new to R, and I have not been able to find a way to do this,
with plyr or otherwise. Any help would be greatly appreciated.
Thanks and best regards,
Jeff
--
View this message in context: http://n4.nabble.com/Choosing-and-preserving-a-random-duplicate-tp1746091p1746091.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list