[R] Add rank column to data frame as in SQL...

Fri Jun 1 19:48:36 CEST 2012

Hi,

Couldn't find any problems if I understand what you mean.

A.K.

> DF <-within(DF,rank<-ave(Salary,Company,FUN=function(x)rev(order(x))))
> DF
   Company Person    Salary rank
1       GM      A  7067.905    4
2     Ford      B  9946.616    3
3       GM      C 31627.171    8
4   Toyota      D 51863.426    1
5       GM      E 66200.508    7
6   Toyota      F 40683.019    4
7       GM      G 91287.592    3
8   Toyota      H 29360.337    2
9   Toyota      I 45906.573    7
10      GM      J 33239.467    6
11    Ford      K 65087.047    2
12  Toyota      L 25801.678    3
13      GM      M 47854.525    5
14      GM      N 76631.067    2
15  Toyota      O  8424.691    5
16      GM      P 87532.133    1
17  Toyota      Q 33907.294    6
18    Ford      R 83944.035    1
> DF[DF$Company=="Ford",]
   Company Person    Salary rank
2     Ford      B  9946.616    3
11    Ford      K 65087.047    2
18    Ford      R 83944.035    1
> DF[DF$Company=="Toyota",]
   Company Person    Salary rank
4   Toyota      D 51863.426    1
6   Toyota      F 40683.019    4
8   Toyota      H 29360.337    2
9   Toyota      I 45906.573    7
12  Toyota      L 25801.678    3
15  Toyota      O  8424.691    5
17  Toyota      Q 33907.294    6
> DF[DF$Company=="GM",]
   Company Person    Salary rank
1       GM      A  7067.905    4
3       GM      C 31627.171    8
5       GM      E 66200.508    7
7       GM      G 91287.592    3
10      GM      J 33239.467    6
13      GM      M 47854.525    5
14      GM      N 76631.067    2
16      GM      P 87532.133    1

----- Original Message -----
From: Brigid Mooney <bkmooney at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Friday, June 1, 2012 11:23 AM
Subject: [R] Add rank column to data frame as in SQL...

Hopefully this is an easy problem...

I'm trying to add a partitioned rank column to a data frame where the
rank is calculated separately across a partition by categories, the
way you could easily do in SQL.  I found this solution in the archives
that looked like it might work:

http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html

The example has a data frame with several car companies, and employee
salaries within them.  A column is then added to the data.frame which
should give the descending rank for each employee, partitioned by
company.

But when I implemented it, the results weren't the expected rankings.
What am I doing wrong?

set.seed(1)
DF <- data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
Person=LETTERS[1:18],Salary=runif(18)*1e5)
DF <- within(DF, rank <- ave(Salary, Company, FUN=function(x)rev(order(x))))

# Then checking each category manually
DF[DF$Company == "Ford",]
DF[DF$Company == "GM",]
DF[DF$Company == "Toyota",]

# My results show that it works for Ford and GM, but not Toyota
> DF[DF$Company == "Ford",]
   Company Person   Salary rank
1     Ford      A 38003.52    4
5     Ford      E 65167.38    2
10    Ford      J 38238.80    3
11    Ford      K 86969.08    1
12    Ford      L 34034.90    5
> DF[DF$Company == "GM",]
   Company Person   Salary rank
4       GM      D 21214.25    6
6       GM      F 12555.51    7
7       GM      G 26722.07    5
13      GM      M 48208.01    4
15      GM      O 49354.13    3
17      GM      Q 82737.33    1
18      GM      R 66846.67    2
> DF[DF$Company == "Toyota",]
   Company Person    Salary rank
2   Toyota      B 77744.522    2
3   Toyota      C 93470.523    1
8   Toyota      H 38611.409    5
9   Toyota      I  1339.033    3
14  Toyota      N 59956.583    6
16  Toyota      P 18621.760    4

For reference, I'm using R 2.11.1 on a Windows 7 machine.

Can anyone provide insight into how I am implementing this
incorrectly, or give an alternate way to add such a partitioned rank
column to a data frame?

Thanks in advance,
Brigid

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.