[R] Add rank column to data frame as in SQL...

Brigid Mooney bkmooney at gmail.com
Fri Jun 1 17:23:03 CEST 2012


Hopefully this is an easy problem...

I'm trying to add a partitioned rank column to a data frame where the
rank is calculated separately across a partition by categories, the
way you could easily do in SQL.  I found this solution in the archives
that looked like it might work:

http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html

The example has a data frame with several car companies, and employee
salaries within them.  A column is then added to the data.frame which
should give the descending rank for each employee, partitioned by
company.


But when I implemented it, the results weren't the expected rankings.
 What am I doing wrong?

set.seed(1)
DF <- data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
Person=LETTERS[1:18],Salary=runif(18)*1e5)
DF <- within(DF, rank <- ave(Salary, Company, FUN=function(x)rev(order(x))))

# Then checking each category manually
DF[DF$Company == "Ford",]
DF[DF$Company == "GM",]
DF[DF$Company == "Toyota",]

# My results show that it works for Ford and GM, but not Toyota
> DF[DF$Company == "Ford",]
   Company Person   Salary rank
1     Ford      A 38003.52    4
5     Ford      E 65167.38    2
10    Ford      J 38238.80    3
11    Ford      K 86969.08    1
12    Ford      L 34034.90    5
> DF[DF$Company == "GM",]
   Company Person   Salary rank
4       GM      D 21214.25    6
6       GM      F 12555.51    7
7       GM      G 26722.07    5
13      GM      M 48208.01    4
15      GM      O 49354.13    3
17      GM      Q 82737.33    1
18      GM      R 66846.67    2
> DF[DF$Company == "Toyota",]
   Company Person    Salary rank
2   Toyota      B 77744.522    2
3   Toyota      C 93470.523    1
8   Toyota      H 38611.409    5
9   Toyota      I  1339.033    3
14  Toyota      N 59956.583    6
16  Toyota      P 18621.760    4


For reference, I'm using R 2.11.1 on a Windows 7 machine.

Can anyone provide insight into how I am implementing this
incorrectly, or give an alternate way to add such a partitioned rank
column to a data frame?

Thanks in advance,
Brigid



More information about the R-help mailing list