Steven McKinney smckinney at bccrc.ca
Fri Apr 20 04:54:05 CEST 2007

```Hi Lukas,

Using by() or its cousins tapply() etc. is tricky,
as you need to properly merge results back into X.

You can do that by adding a key ID variable to X,
and carrying along that key ID variable in calls
to by() etc., though I haven't tested out a method.

You can also create a new column in X to hold the
results, and then sort the subsections of X in a
for() loop.

> X <- data.frame(A = c(1,1,1,2,2,2,3,3,3), B = c(2,3,4,3,1,1,2,1,3))
> X
A B
1 1 2
2 1 3
3 1 4
4 2 3
5 2 1
6 2 1
7 3 2
8 3 1
9 3 3
>
> X\$C <- rep(as.numeric(NA), nrow(X))
>
> sortLevels <- unique(X\$A)
>
> for(i in seq(along = sortLevels)) {
+   sortIdxp <- X\$A == sortLevels[i]
+   X\$C[sortIdxp] <- rank(X\$B[sortIdxp], ties.method = "random")
+ }
> X
A B C
1 1 2 1
2 1 3 2
3 1 4 3
4 2 3 3
5 2 1 1
6 2 1 2
7 3 2 2
8 3 1 1
9 3 3 3
>

Merging results back in after using
tapply() or by() is harder if your
data frame is in random order, but the
for() loop approach with indexing
still works fine.

> set.seed(123)
> Y <- X[sample(9), ]
> Y
A B C
3 1 4 3
7 3 2 2
9 3 3 3
6 2 1 2
5 2 1 1
1 1 2 1
2 1 3 2
8 3 1 1
4 2 3 3
> Y\$C <- rep(as.numeric(NA), nrow(Y))
>
> sortLevels <- unique(Y\$A)
## You can also use levels() instead of unique() if Y\$A is a factor.
>
> for(i in seq(along = sortLevels)) {
+   sortIdxp <- Y\$A == sortLevels[i]
+   Y\$C[sortIdxp] <- rank(Y\$B[sortIdxp], ties.method = "random")
+ }
> Y
A B C
3 1 4 3
7 3 2 2
9 3 3 3
6 2 1 2
5 2 1 1
1 1 2 1
2 1 3 2
8 3 1 1
4 2 3 3
> oY <- order(Y\$A)
> Y[oY,]
A B C
3 1 4 3
1 1 2 1
2 1 3 2
6 2 1 2
5 2 1 1
4 2 3 3
7 3 2 2
9 3 3 3
8 3 1 1
>

HTH

