# [R] Computing an ordering on subsets of a data frame

Steven McKinney smckinney at bccrc.ca
Fri Apr 20 04:54:05 CEST 2007

```Hi Lukas,

Using by() or its cousins tapply() etc. is tricky,
as you need to properly merge results back into X.

You can do that by adding a key ID variable to X,
and carrying along that key ID variable in calls
to by() etc., though I haven't tested out a method.

You can also create a new column in X to hold the
results, and then sort the subsections of X in a
for() loop.

> X <- data.frame(A = c(1,1,1,2,2,2,3,3,3), B = c(2,3,4,3,1,1,2,1,3))
> X
A B
1 1 2
2 1 3
3 1 4
4 2 3
5 2 1
6 2 1
7 3 2
8 3 1
9 3 3
>
> X\$C <- rep(as.numeric(NA), nrow(X))
>
> sortLevels <- unique(X\$A)
>
> for(i in seq(along = sortLevels)) {
+   sortIdxp <- X\$A == sortLevels[i]
+   X\$C[sortIdxp] <- rank(X\$B[sortIdxp], ties.method = "random")
+ }
> X
A B C
1 1 2 1
2 1 3 2
3 1 4 3
4 2 3 3
5 2 1 1
6 2 1 2
7 3 2 2
8 3 1 1
9 3 3 3
>

Merging results back in after using
tapply() or by() is harder if your
data frame is in random order, but the
for() loop approach with indexing
still works fine.

> set.seed(123)
> Y <- X[sample(9), ]
> Y
A B C
3 1 4 3
7 3 2 2
9 3 3 3
6 2 1 2
5 2 1 1
1 1 2 1
2 1 3 2
8 3 1 1
4 2 3 3
> Y\$C <- rep(as.numeric(NA), nrow(Y))
>
> sortLevels <- unique(Y\$A)
## You can also use levels() instead of unique() if Y\$A is a factor.
>
> for(i in seq(along = sortLevels)) {
+   sortIdxp <- Y\$A == sortLevels[i]
+   Y\$C[sortIdxp] <- rank(Y\$B[sortIdxp], ties.method = "random")
+ }
> Y
A B C
3 1 4 3
7 3 2 2
9 3 3 3
6 2 1 2
5 2 1 1
1 1 2 1
2 1 3 2
8 3 1 1
4 2 3 3
> oY <- order(Y\$A)
> Y[oY,]
A B C
3 1 4 3
1 1 2 1
2 1 3 2
6 2 1 2
5 2 1 1
4 2 3 3
7 3 2 2
9 3 3 3
8 3 1 1
>

HTH

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney at bccrc.ca
tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C.
V5Z 1L3

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-
> bounces at stat.math.ethz.ch] On Behalf Of Lukas Biewald
> Sent: Wednesday, April 18, 2007 2:49 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Computing an ordering on subsets of a data frame
>
> If I have a data frame X that looks like this:
>
> A B
> - -
> 1 2
> 1 3
> 1 4
> 2 3
> 2 1
> 2 1
> 3 2
> 3 1
> 3 3
>
> and I want to make another column which has the rank of B computed
> separately for each value of A.
>
> I.e. something like:
>
> A B C
> - - -
> 1 2 1
> 1 3 2
> 1 4 3
> 2 3 3
> 2 1 1
> 2 1 2
> 3 2 2
> 3 1 1
> 3 3 3
>
> by(X, X[,1], function(x) { rank(x[,1], ties.method="random") } )
almost
> seems to work, but the data is not in a frame, and I can't figure out
how
> to
> merge it back into X properly.
>
> Thanks,
> Lukas
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help