[R] use loop or use apply?
Prasenjit Kapat
kapatp at gmail.com
Fri May 18 01:56:12 CEST 2007
Hi,
I have two matrices, A (axd) and B (bxd). I want to get another matrix C (axb)
such that, C[i,j] is the Euclidean distance between the ith row of A and jth
row of B. In general, I can say that C[i,j] = some.function (A[i,], B[j,]).
What is the best method for doing so? (assume a < b)
I have been doing some exploration myself: Consider the following function:
get.f, in which, 'method=1' is the rudimentary double for loop; 'method=2'
avoids one loop by constructing a bigger matrix, but doesn't use
apply(); 'method=3' avoids both the loops by using apply() and constructing
bigger matrices; 'method=4' avoids constructing bigger matrices by using
apply() twice.
get.f <- function (A, B, method=2) {
if (method == 1){
a <- nrow(A); b <- nrow(B);
C <- matrix(NA, nrow=a, ncol=b);
for (i in 1:a)
for (j in 1:b)
C[i,j] <- sum((A[i,]-B[j,])^2)
} else if (method == 2 ) {
a <- nrow(A); b <- nrow(B); d <- ncol(A);
C <- matrix(NA, nrow=a, ncol=b);
for (i in 1:a)
C[i,] <- rowSums((matrix(A[i,], nrow=b, ncol=d, byrow=TRUE) - B) ^ 2)
} else if (method == 3) {
C <- t(apply(A, MARGIN=1, FUN="FUN1", BB=B)); # transpose is needed
} else if (method == 4) {
C <- t(apply(A, MARGIN=1, FUN="FUN2", BB=B))
}
}
FUN1 <- function(aa, BB)
return(rowSums(
(matrix(aa, nrow=nrow(BB), ncol=ncol(BB), byrow=TRUE) - BB)^2)
)
FUN2 <- function(aa, BB)
return(apply(BB, MARGIN=1, FUN="FUN3", aa=aa))
FUN3 <- function(bb,aa) return(sum((aa-bb)^2))
### With these methods and the following intitializations,
a <- 100; b <- 1000; d <- 100; n.loop <- 20;
A <- matrix(rnorm(a*d), ncol=d)
B <- matrix(rnorm(b*d), ncol=d)
all.times <- matrix(0,nrow=5,ncol=4)
rownames(all.times) <- rownames(as.matrix(system.time(NULL)))
for (i in 1:4)
for (j in 1:n.loop)
all.times[,i] <- all.times[,i] +
as.matrix(system.time(C <- get.f(A=A, B=B, method=i)))
all.times <- all.times / n.loop
print(all.times)
[,1] [,2] [,3] [,4]
user.self 4.0554 1.50010 1.50130 4.51285
sys.self 0.0370 0.02420 0.01800 0.04260
elapsed 4.2705 1.58865 1.59475 6.07535
user.child 0.0000 0.00000 0.00000 0.00000
sys.child 0.0000 0.00000 0.00000 0.00000
'method=2' stands out be the best and 'method=1' (for loops) beats 'method=4'
(two apply()s)... Is that expected?
Is it possible to improve over 'method=2'?
Thanks
PK
PS: The mail text seems fine in my composer, I hope, it looks decent in your
reader.
More information about the R-help
mailing list