# [R] vectorizing row selection

Corey Moffet cmoffet at nwrc.ars.usda.gov
Thu Apr 15 16:52:18 CEST 2004

```Dear R-help:

I have a data frame (df1) with elements a, b, and c that identify a unique
set of conditions of interest; l and m identify other conditions; and x and
y are responses.

df1 <- data.frame(a = c(1,1,1,2,2,2,3,3,3), b = c(10,10,10,20,20,20,30,30,30),
c = c(100,100,100,200,200,200,300,300,300),
l = runif(9), m = runif(9),
x = c(1,2,2,2,2,1,2,2,1), y = c(3,2,1,3,2,1,3,2,1))
df1

I want to select 1 row from df1 for each combination of df1\$a, df1\$b, and
df1\$c that has first the max of df1\$x for that combination and than in the
case of a tie the max of df1\$y and put in a new data.frame:

a  b   c         l           m x y
1 1 10 100 0.2222679 0.351739848 2 2
2 2 20 200 0.2219270 0.002530816 2 3
3 3 30 300 0.1260224 0.820658343 2 3

My method is as follows:

max.by.x <- aggregate(list(x = df1\$x),
list(a = df1\$a, b = df1\$b, c = df1\$c),
max)

df2 <- df1[1,]
for ( i in 1:length(max.by.x[,1]) ) {
index <- which(df1\$a == max.by.x\$a[i] &
df1\$b == max.by.x\$b[i] &
df1\$c == max.by.x\$c[i] &
df1\$x == max.by.x\$x[i])
index2 <- which.max(df1\$y[index])
df2[i,] <- df1[index[index2],]
}

df2

This seems to work, but for real data with 12000 rows it is really slow.
Does anyone have any ideas for improvement (e.g. vectorizing what is done
in the loop)?

With best wishes and kind regards I am

Sincerely,

Corey A. Moffet
Rangeland Scientist

##################################################################
####
USDA-ARS                                        #
Northwest Watershed Research Center             #
800 Park Blvd, Plaza IV, Suite 105          ###########   ####
Boise, ID 83712-7716                       #    #      # #
Voice: (208) 422-0718                      #    #  ####   ####
FAX:   (208) 334-1502                      #    # #           #
####   ###########
##################################################################

```