[R] Subsetting for the ten highest values by group in a dataframe
Sam Albers
tonightsthenight at gmail.com
Fri Jan 27 20:26:48 CET 2012
Hello,
I am looking for a way to subset a data frame by choosing the top ten
maximum values from that dataframe. As well this occurs within some
factor levels.
## I've used plyr here but I'm not married to this approach
require(plyr)
## I've created a data.frame with two groups and then a id variable (y)
df <- data.frame(x=rnorm(400, mean=20), y=1:400, z=c("A","B"))
## So using ddply I can find the highest value of x
df.max1 <- ddply(df, c("z"), subset, x==sort(x, TRUE)[1])
## Or the 2nd highest value
df.max2 <- ddply(df, c("z"), subset, x==sort(x, TRUE)[2])
## And so on.... but when I try to make a series of numbers like so
## to get the top ten values, I don't get a warning message but
## two values that don't really make sense to me
df.max <- ddply(df, c("z"), subset, x==sort(x, TRUE)[1:10])
## So no error message when I use the method above, which is clearly wrong.
## But I really am not sure how to diagnose the problem.
## Can anyone suggest a way to subset a data.frame with groups to
select the top ten max values in that data.frame for each group?
## Thanks so much in advance?
Sam
More information about the R-help
mailing list