[R] apply and sort vs vectorized order

Mon Aug 18 13:13:38 CEST 2003

Dear all,
Trying to solve a problem I had (see thread "putting NAs at the end" )
I've noticed a difference in system time requirements between using apply
and sort (or order)
to order  each row or column of a matrix compared to a vectorized function I
wrote.
Using apply is much faster when the number of loops (number of rows or
columns to order) is low BUT
much slower when number of loops are high and the other dimension short.
Here is my function:

order.rc<-function(A,row.column=1,na.last = TRUE, decreasing =
FALSE,return.sort=TRUE) {
# removes negative values scaling A so min(A)=0
A.order<-A+abs(min(A,na.rm=TRUE))
# rescales A so max(A)=0.1
A.order<-A.order/(max(A.order,na.rm=TRUE)*10)
# makes NAs=0 (na.last=FALSE) or NAs=0.9 (na.last=TRUE)
# NOTE: if decreasing is TRUE NAs are the inverse of above
if ((na.last & !decreasing) | (!na.last & decreasing))
A.order[which(is.na(A.order))]<-0.9  else A.order[which(is.na(A.order))]<-0
# if ordering each row the integer part of A is the column index
(row.column=1)
# else, we are ordering each column so the integer part of A is the column
index
if (row.column==1) A.order<-A.order+rep(1:nrow(A),ncol(A))   else
A.order<-A.order+rep(1:ncol(A),each=nrow(A))
# returns either a matrix with sorted values or the ordering indexes
if (return.sort)
{
A.order<-A[order(A.order,decreasing=decreasing)]
if (row.column==1)
{
dim(A.order)<-dim(t(A))
A.order<-t(A.order)
}
else dim(A.order)<-dim(A)
return(A.order)
}
else return(order(A.order,decreasing=decreasing))
}

# Some system time comparisons
# CHANGE Nrandom ACORDING TO YOUR SYSTEM
Nrandom=1000
A<-matrix(rnorm(Nrandom*Nrandom),nrow=Nrandom,ncol=Nrandom)
A[rbind(c(100,3),c(90,9),c(40,6))]<-NA
system.time({A.r<-order.rc(A)})
system.time(A.s1<-apply(A,1,sort))
system.time({A.c<-order.rc(A,row.column=2)})
system.time(A.s2<-apply(A,2,sort))

A<-matrix(rnorm(Nrandom*Nrandom),nrow=Nrandom*Nrandom/10,ncol=10)
A[rbind(c(100,3),c(90,9),c(40,6))]<-NA
system.time({A.r<-order.rc(A)})
system.time(A.s1<-apply(A,1,order))
system.time({A.c<-order.rc(A,row.column=2)})
system.time(A.s2<-apply(A,2,order))

I think only the third apply is slower than the function because number of
"loops" is too high
and my function is faster despite the long vector to order.
Thanks for any clarifications on how all this works,
Angel