[R] Add rank column to data frame as in SQL...
arun
smartpink111 at yahoo.com
Fri Jun 1 20:21:29 CEST 2012
Hi,
Try this:
DF1<-DF[with(DF,rev(order(Company,Salary))),]
DF2<-within(DF1,rank<-ave(Salary,Company,FUN=function(x)rev(order(x))))
> DF2
Company Person Salary rank
15 Toyota O 68278.808 1
5 Toyota E 57487.220 2
11 Toyota K 56090.075 3
12 Toyota L 52602.772 4
14 Toyota N 50764.182 5
18 Toyota R 25816.593 6
2 Toyota B 20754.511 7
6 Toyota F 7706.438 8
13 GM M 98509.522 1
9 GM I 92861.520 2
4 GM D 59571.200 3
1 GM A 51116.978 4
17 GM Q 23886.868 5
3 GM C 22865.814 6
8 Ford H 64279.549 1
16 Ford P 60154.122 2
10 Ford J 59809.242 3
7 Ford G 3554.058 4
A.K.
________________________________
From: Brigid Rotigel <brotigel at gmail.com>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Friday, June 1, 2012 2:04 PM
Subject: Re: [R] Add rank column to data frame as in SQL...
Sorry, I meant to put that it didn't work for "GM", not "Toyota".
> DF[DF$Company=="GM",]
Company Person Salary rank
1 GM A 7067.905 4
3 GM C 31627.171 8
5 GM E 66200.508 7
7 GM G 91287.592 3
10 GM J 33239.467 6
13 GM M 47854.525 5
14 GM N 76631.067 2
16 GM P 87532.133 1
In this data, person "G" earned 91287, but has rank 3 even though this is the highest salary in this partition. The lowest salary earned is the 7067 by person "A", so this should have rank = 8, but in fact this gives rank = 4.
Is that more clear?
Thanks,
Brigid
On Fri, Jun 1, 2012 at 1:48 PM, arun <smartpink111 at yahoo.com> wrote:
Hi,
>
>Couldn't find any problems if I understand what you mean.
>
>A.K.
>
>
>
>> DF <-within(DF,rank<-ave(Salary,Company,FUN=function(x)rev(order(x))))
>> DF
> Company Person Salary rank
>1 GM A 7067.905 4
>2 Ford B 9946.616 3
>3 GM C 31627.171 8
>4 Toyota D 51863.426 1
>5 GM E 66200.508 7
>6 Toyota F 40683.019 4
>7 GM G 91287.592 3
>8 Toyota H 29360.337 2
>9 Toyota I 45906.573 7
>10 GM J 33239.467 6
>11 Ford K 65087.047 2
>12 Toyota L 25801.678 3
>13 GM M 47854.525 5
>14 GM N 76631.067 2
>15 Toyota O 8424.691 5
>16 GM P 87532.133 1
>17 Toyota Q 33907.294 6
>18 Ford R 83944.035 1
>> DF[DF$Company=="Ford",]
> Company Person Salary rank
>2 Ford B 9946.616 3
>11 Ford K 65087.047 2
>18 Ford R 83944.035 1
>> DF[DF$Company=="Toyota",]
> Company Person Salary rank
>4 Toyota D 51863.426 1
>6 Toyota F 40683.019 4
>8 Toyota H 29360.337 2
>9 Toyota I 45906.573 7
>12 Toyota L 25801.678 3
>15 Toyota O 8424.691 5
>17 Toyota Q 33907.294 6
>> DF[DF$Company=="GM",]
> Company Person Salary rank
>1 GM A 7067.905 4
>3 GM C 31627.171 8
>5 GM E 66200.508 7
>7 GM G 91287.592 3
>10 GM J 33239.467 6
>13 GM M 47854.525 5
>14 GM N 76631.067 2
>16 GM P 87532.133 1
>
>
>
>
>----- Original Message -----
>From: Brigid Mooney <bkmooney at gmail.com>
>To: r-help at r-project.org
>Cc:
>Sent: Friday, June 1, 2012 11:23 AM
>Subject: [R] Add rank column to data frame as in SQL...
>
>Hopefully this is an easy problem...
>
>I'm trying to add a partitioned rank column to a data frame where the
>rank is calculated separately across a partition by categories, the
>way you could easily do in SQL. I found this solution in the archives
>that looked like it might work:
>
>http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html
>
>The example has a data frame with several car companies, and employee
>salaries within them. A column is then added to the data.frame which
>should give the descending rank for each employee, partitioned by
>company.
>
>
>But when I implemented it, the results weren't the expected rankings.
>What am I doing wrong?
>
>set.seed(1)
>DF <- data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
>Person=LETTERS[1:18],Salary=runif(18)*1e5)
>DF <- within(DF, rank <- ave(Salary, Company, FUN=function(x)rev(order(x))))
>
># Then checking each category manually
>DF[DF$Company == "Ford",]
>DF[DF$Company == "GM",]
>DF[DF$Company == "Toyota",]
>
># My results show that it works for Ford and GM, but not Toyota
>> DF[DF$Company == "Ford",]
> Company Person Salary rank
>1 Ford A 38003.52 4
>5 Ford E 65167.38 2
>10 Ford J 38238.80 3
>11 Ford K 86969.08 1
>12 Ford L 34034.90 5
>> DF[DF$Company == "GM",]
> Company Person Salary rank
>4 GM D 21214.25 6
>6 GM F 12555.51 7
>7 GM G 26722.07 5
>13 GM M 48208.01 4
>15 GM O 49354.13 3
>17 GM Q 82737.33 1
>18 GM R 66846.67 2
>> DF[DF$Company == "Toyota",]
> Company Person Salary rank
>2 Toyota B 77744.522 2
>3 Toyota C 93470.523 1
>8 Toyota H 38611.409 5
>9 Toyota I 1339.033 3
>14 Toyota N 59956.583 6
>16 Toyota P 18621.760 4
>
>
>For reference, I'm using R 2.11.1 on a Windows 7 machine.
>
>Can anyone provide insight into how I am implementing this
>incorrectly, or give an alternate way to add such a partitioned rank
>column to a data frame?
>
>Thanks in advance,
>Brigid
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list