[R] Add rank column to data frame as in SQL...
arun
smartpink111 at yahoo.com
Fri Jun 1 19:48:36 CEST 2012
Hi,
Couldn't find any problems if I understand what you mean.
A.K.
> DF <-within(DF,rank<-ave(Salary,Company,FUN=function(x)rev(order(x))))
> DF
Company Person Salary rank
1 GM A 7067.905 4
2 Ford B 9946.616 3
3 GM C 31627.171 8
4 Toyota D 51863.426 1
5 GM E 66200.508 7
6 Toyota F 40683.019 4
7 GM G 91287.592 3
8 Toyota H 29360.337 2
9 Toyota I 45906.573 7
10 GM J 33239.467 6
11 Ford K 65087.047 2
12 Toyota L 25801.678 3
13 GM M 47854.525 5
14 GM N 76631.067 2
15 Toyota O 8424.691 5
16 GM P 87532.133 1
17 Toyota Q 33907.294 6
18 Ford R 83944.035 1
> DF[DF$Company=="Ford",]
Company Person Salary rank
2 Ford B 9946.616 3
11 Ford K 65087.047 2
18 Ford R 83944.035 1
> DF[DF$Company=="Toyota",]
Company Person Salary rank
4 Toyota D 51863.426 1
6 Toyota F 40683.019 4
8 Toyota H 29360.337 2
9 Toyota I 45906.573 7
12 Toyota L 25801.678 3
15 Toyota O 8424.691 5
17 Toyota Q 33907.294 6
> DF[DF$Company=="GM",]
Company Person Salary rank
1 GM A 7067.905 4
3 GM C 31627.171 8
5 GM E 66200.508 7
7 GM G 91287.592 3
10 GM J 33239.467 6
13 GM M 47854.525 5
14 GM N 76631.067 2
16 GM P 87532.133 1
----- Original Message -----
From: Brigid Mooney <bkmooney at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Friday, June 1, 2012 11:23 AM
Subject: [R] Add rank column to data frame as in SQL...
Hopefully this is an easy problem...
I'm trying to add a partitioned rank column to a data frame where the
rank is calculated separately across a partition by categories, the
way you could easily do in SQL. I found this solution in the archives
that looked like it might work:
http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html
The example has a data frame with several car companies, and employee
salaries within them. A column is then added to the data.frame which
should give the descending rank for each employee, partitioned by
company.
But when I implemented it, the results weren't the expected rankings.
What am I doing wrong?
set.seed(1)
DF <- data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
Person=LETTERS[1:18],Salary=runif(18)*1e5)
DF <- within(DF, rank <- ave(Salary, Company, FUN=function(x)rev(order(x))))
# Then checking each category manually
DF[DF$Company == "Ford",]
DF[DF$Company == "GM",]
DF[DF$Company == "Toyota",]
# My results show that it works for Ford and GM, but not Toyota
> DF[DF$Company == "Ford",]
Company Person Salary rank
1 Ford A 38003.52 4
5 Ford E 65167.38 2
10 Ford J 38238.80 3
11 Ford K 86969.08 1
12 Ford L 34034.90 5
> DF[DF$Company == "GM",]
Company Person Salary rank
4 GM D 21214.25 6
6 GM F 12555.51 7
7 GM G 26722.07 5
13 GM M 48208.01 4
15 GM O 49354.13 3
17 GM Q 82737.33 1
18 GM R 66846.67 2
> DF[DF$Company == "Toyota",]
Company Person Salary rank
2 Toyota B 77744.522 2
3 Toyota C 93470.523 1
8 Toyota H 38611.409 5
9 Toyota I 1339.033 3
14 Toyota N 59956.583 6
16 Toyota P 18621.760 4
For reference, I'm using R 2.11.1 on a Windows 7 machine.
Can anyone provide insight into how I am implementing this
incorrectly, or give an alternate way to add such a partitioned rank
column to a data frame?
Thanks in advance,
Brigid
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list