[R] Add rank column to data frame as in SQL...
William Dunlap
wdunlap at tibco.com
Fri Jun 1 22:59:29 CEST 2012
Try using the rank function instead of the order function.
rank(x) is order(order(x)) if there are no ties.
Since you want reverse ranks do either rank(-x) or length(x)+1-rank(x).
E.g.,
> DF <- within(DF, rank2 <- ave(Salary, Company, FUN=function(x)rank(-x)))
> DF[DF$Company=="Toyota",]
Company Person Salary rank rank2
2 Toyota B 77744.522 2 2
3 Toyota C 93470.523 1 1
8 Toyota H 38611.409 5 4
9 Toyota I 1339.033 3 6
14 Toyota N 59956.583 6 3
16 Toyota P 18621.760 4 5
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Brigid Mooney
> Sent: Friday, June 01, 2012 5:23 PM
> To: r-help at r-project.org
> Subject: [R] Add rank column to data frame as in SQL...
>
> Hopefully this is an easy problem...
>
> I'm trying to add a partitioned rank column to a data frame where the
> rank is calculated separately across a partition by categories, the
> way you could easily do in SQL. I found this solution in the archives
> that looked like it might work:
>
> http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html
>
> The example has a data frame with several car companies, and employee
> salaries within them. A column is then added to the data.frame which
> should give the descending rank for each employee, partitioned by
> company.
>
>
> But when I implemented it, the results weren't the expected rankings.
> What am I doing wrong?
>
> set.seed(1)
> DF <- data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
> Person=LETTERS[1:18],Salary=runif(18)*1e5)
> DF <- within(DF, rank <- ave(Salary, Company, FUN=function(x)rev(order(x))))
>
> # Then checking each category manually
> DF[DF$Company == "Ford",]
> DF[DF$Company == "GM",]
> DF[DF$Company == "Toyota",]
>
> # My results show that it works for Ford and GM, but not Toyota
> > DF[DF$Company == "Ford",]
> Company Person Salary rank
> 1 Ford A 38003.52 4
> 5 Ford E 65167.38 2
> 10 Ford J 38238.80 3
> 11 Ford K 86969.08 1
> 12 Ford L 34034.90 5
> > DF[DF$Company == "GM",]
> Company Person Salary rank
> 4 GM D 21214.25 6
> 6 GM F 12555.51 7
> 7 GM G 26722.07 5
> 13 GM M 48208.01 4
> 15 GM O 49354.13 3
> 17 GM Q 82737.33 1
> 18 GM R 66846.67 2
> > DF[DF$Company == "Toyota",]
> Company Person Salary rank
> 2 Toyota B 77744.522 2
> 3 Toyota C 93470.523 1
> 8 Toyota H 38611.409 5
> 9 Toyota I 1339.033 3
> 14 Toyota N 59956.583 6
> 16 Toyota P 18621.760 4
>
>
> For reference, I'm using R 2.11.1 on a Windows 7 machine.
>
> Can anyone provide insight into how I am implementing this
> incorrectly, or give an alternate way to add such a partitioned rank
> column to a data frame?
>
> Thanks in advance,
> Brigid
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list