[R] Subsetting for the ten highest values by group in a dataframe

Fri Jan 27 20:26:48 CET 2012

Hello,

I am looking for a way to subset a data frame by choosing the top ten
maximum values from that dataframe. As well this occurs within some
factor levels.

## I've used plyr here but I'm not married to this approach
require(plyr)

## I've created a data.frame with two groups and then a id variable (y)
df <- data.frame(x=rnorm(400, mean=20), y=1:400, z=c("A","B"))

## So using ddply I can find the highest value of x
df.max1 <- ddply(df, c("z"), subset, x==sort(x, TRUE)[1])

## Or the 2nd highest value
df.max2 <- ddply(df, c("z"), subset, x==sort(x, TRUE)[2])

## And so on.... but when I try to make a series of numbers like so
## to get the top ten values, I don't get a warning message but
## two values that don't really make sense to me
df.max <- ddply(df, c("z"), subset, x==sort(x, TRUE)[1:10])

## So no error message when I use the method above, which is clearly wrong.
## But I really am not sure how to diagnose the problem.

## Can anyone suggest a way to subset a data.frame with groups to
select the top ten max values in that data.frame for each group?

## Thanks so much in advance?

Sam