[R] Add rank column to data frame as in SQL...

Fri Jun 1 20:21:29 CEST 2012

Hi,
Try this:

DF1<-DF[with(DF,rev(order(Company,Salary))),]

DF2<-within(DF1,rank<-ave(Salary,Company,FUN=function(x)rev(order(x))))

> DF2
   Company Person    Salary rank
15  Toyota      O 68278.808    1
5   Toyota      E 57487.220    2
11  Toyota      K 56090.075    3
12  Toyota      L 52602.772    4
14  Toyota      N 50764.182    5
18  Toyota      R 25816.593    6
2   Toyota      B 20754.511    7
6   Toyota      F  7706.438    8
13      GM      M 98509.522    1
9       GM      I 92861.520    2
4       GM      D 59571.200    3
1       GM      A 51116.978    4
17      GM      Q 23886.868    5
3       GM      C 22865.814    6
8     Ford      H 64279.549    1
16    Ford      P 60154.122    2
10    Ford      J 59809.242    3
7     Ford      G  3554.058    4

A.K.

________________________________
From: Brigid Rotigel <brotigel at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Cc: R help <r-help at r-project.org> 
Sent: Friday, June 1, 2012 2:04 PM
Subject: Re: [R] Add rank column to data frame as in SQL...

Sorry, I meant to put that it didn't work for "GM", not "Toyota".  
> DF[DF$Company=="GM",]
   Company Person    Salary rank
1       GM      A  7067.905    4
3       GM      C 31627.171    8
5       GM      E 66200.508    7
7       GM      G 91287.592    3
10      GM      J 33239.467    6
13      GM      M 47854.525    5
14      GM      N 76631.067    2

16      GM      P 87532.133    1

In this data, person "G" earned 91287, but has rank 3 even though this is the highest salary in this partition.  The lowest salary earned is the 7067 by person "A", so this should have rank = 8, but in fact this gives rank = 4.  

Is that more clear?  

Thanks,
Brigid

On Fri, Jun 1, 2012 at 1:48 PM, arun <smartpink111 at yahoo.com> wrote:

Hi,
>
>Couldn't find any problems if I understand what you mean.
>
>A.K.
>
>
>
>> DF <-within(DF,rank<-ave(Salary,Company,FUN=function(x)rev(order(x))))
>> DF
>   Company Person    Salary rank
>1       GM      A  7067.905    4
>2     Ford      B  9946.616    3
>3       GM      C 31627.171    8
>4   Toyota      D 51863.426    1
>5       GM      E 66200.508    7
>6   Toyota      F 40683.019    4
>7       GM      G 91287.592    3
>8   Toyota      H 29360.337    2
>9   Toyota      I 45906.573    7
>10      GM      J 33239.467    6
>11    Ford      K 65087.047    2
>12  Toyota      L 25801.678    3
>13      GM      M 47854.525    5
>14      GM      N 76631.067    2
>15  Toyota      O  8424.691    5
>16      GM      P 87532.133    1
>17  Toyota      Q 33907.294    6
>18    Ford      R 83944.035    1
>> DF[DF$Company=="Ford",]
>   Company Person    Salary rank
>2     Ford      B  9946.616    3
>11    Ford      K 65087.047    2
>18    Ford      R 83944.035    1
>> DF[DF$Company=="Toyota",]
>   Company Person    Salary rank
>4   Toyota      D 51863.426    1
>6   Toyota      F 40683.019    4
>8   Toyota      H 29360.337    2
>9   Toyota      I 45906.573    7
>12  Toyota      L 25801.678    3
>15  Toyota      O  8424.691    5
>17  Toyota      Q 33907.294    6
>> DF[DF$Company=="GM",]
>   Company Person    Salary rank
>1       GM      A  7067.905    4
>3       GM      C 31627.171    8
>5       GM      E 66200.508    7
>7       GM      G 91287.592    3
>10      GM      J 33239.467    6
>13      GM      M 47854.525    5
>14      GM      N 76631.067    2
>16      GM      P 87532.133    1
>
>
>
>
>----- Original Message -----
>From: Brigid Mooney <bkmooney at gmail.com>
>To: r-help at r-project.org
>Cc:
>Sent: Friday, June 1, 2012 11:23 AM
>Subject: [R] Add rank column to data frame as in SQL...
>
>Hopefully this is an easy problem...
>
>I'm trying to add a partitioned rank column to a data frame where the
>rank is calculated separately across a partition by categories, the
>way you could easily do in SQL.  I found this solution in the archives
>that looked like it might work:
>
>http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html
>
>The example has a data frame with several car companies, and employee
>salaries within them.  A column is then added to the data.frame which
>should give the descending rank for each employee, partitioned by
>company.
>
>
>But when I implemented it, the results weren't the expected rankings.
>What am I doing wrong?
>
>set.seed(1)
>DF <- data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
>Person=LETTERS[1:18],Salary=runif(18)*1e5)
>DF <- within(DF, rank <- ave(Salary, Company, FUN=function(x)rev(order(x))))
>
># Then checking each category manually
>DF[DF$Company == "Ford",]
>DF[DF$Company == "GM",]
>DF[DF$Company == "Toyota",]
>
># My results show that it works for Ford and GM, but not Toyota
>> DF[DF$Company == "Ford",]
>   Company Person   Salary rank
>1     Ford      A 38003.52    4
>5     Ford      E 65167.38    2
>10    Ford      J 38238.80    3
>11    Ford      K 86969.08    1
>12    Ford      L 34034.90    5
>> DF[DF$Company == "GM",]
>   Company Person   Salary rank
>4       GM      D 21214.25    6
>6       GM      F 12555.51    7
>7       GM      G 26722.07    5
>13      GM      M 48208.01    4
>15      GM      O 49354.13    3
>17      GM      Q 82737.33    1
>18      GM      R 66846.67    2
>> DF[DF$Company == "Toyota",]
>   Company Person    Salary rank
>2   Toyota      B 77744.522    2
>3   Toyota      C 93470.523    1
>8   Toyota      H 38611.409    5
>9   Toyota      I  1339.033    3
>14  Toyota      N 59956.583    6
>16  Toyota      P 18621.760    4
>
>
>For reference, I'm using R 2.11.1 on a Windows 7 machine.
>
>Can anyone provide insight into how I am implementing this
>incorrectly, or give an alternate way to add such a partitioned rank
>column to a data frame?
>
>Thanks in advance,
>Brigid
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>