[R] Add rank column to data frame as in SQL...

William Dunlap wdunlap at tibco.com
Fri Jun 1 22:59:29 CEST 2012


Try using the rank function instead of the order function.
rank(x) is order(order(x)) if there are no ties.
Since you want reverse ranks do either rank(-x) or length(x)+1-rank(x).
E.g.,
  > DF <- within(DF, rank2 <- ave(Salary, Company, FUN=function(x)rank(-x)))
  > DF[DF$Company=="Toyota",]
     Company Person    Salary rank rank2
  2   Toyota      B 77744.522    2     2
  3   Toyota      C 93470.523    1     1
  8   Toyota      H 38611.409    5     4
  9   Toyota      I  1339.033    3     6
  14  Toyota      N 59956.583    6     3
  16  Toyota      P 18621.760    4     5

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Brigid Mooney
> Sent: Friday, June 01, 2012 5:23 PM
> To: r-help at r-project.org
> Subject: [R] Add rank column to data frame as in SQL...
> 
> Hopefully this is an easy problem...
> 
> I'm trying to add a partitioned rank column to a data frame where the
> rank is calculated separately across a partition by categories, the
> way you could easily do in SQL.  I found this solution in the archives
> that looked like it might work:
> 
> http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html
> 
> The example has a data frame with several car companies, and employee
> salaries within them.  A column is then added to the data.frame which
> should give the descending rank for each employee, partitioned by
> company.
> 
> 
> But when I implemented it, the results weren't the expected rankings.
>  What am I doing wrong?
> 
> set.seed(1)
> DF <- data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
> Person=LETTERS[1:18],Salary=runif(18)*1e5)
> DF <- within(DF, rank <- ave(Salary, Company, FUN=function(x)rev(order(x))))
> 
> # Then checking each category manually
> DF[DF$Company == "Ford",]
> DF[DF$Company == "GM",]
> DF[DF$Company == "Toyota",]
> 
> # My results show that it works for Ford and GM, but not Toyota
> > DF[DF$Company == "Ford",]
>    Company Person   Salary rank
> 1     Ford      A 38003.52    4
> 5     Ford      E 65167.38    2
> 10    Ford      J 38238.80    3
> 11    Ford      K 86969.08    1
> 12    Ford      L 34034.90    5
> > DF[DF$Company == "GM",]
>    Company Person   Salary rank
> 4       GM      D 21214.25    6
> 6       GM      F 12555.51    7
> 7       GM      G 26722.07    5
> 13      GM      M 48208.01    4
> 15      GM      O 49354.13    3
> 17      GM      Q 82737.33    1
> 18      GM      R 66846.67    2
> > DF[DF$Company == "Toyota",]
>    Company Person    Salary rank
> 2   Toyota      B 77744.522    2
> 3   Toyota      C 93470.523    1
> 8   Toyota      H 38611.409    5
> 9   Toyota      I  1339.033    3
> 14  Toyota      N 59956.583    6
> 16  Toyota      P 18621.760    4
> 
> 
> For reference, I'm using R 2.11.1 on a Windows 7 machine.
> 
> Can anyone provide insight into how I am implementing this
> incorrectly, or give an alternate way to add such a partitioned rank
> column to a data frame?
> 
> Thanks in advance,
> Brigid
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list