[R] speeding up "sum of squared differences" calculation

Hadley Wickham h.wickham at gmail.com
Tue Oct 22 15:43:24 CEST 2013


> There's little practical difference; both hover from 0.00 to 0.03 s system time. I could barely tell the difference even averaged over 100 runs; I was getting an average around 0.007 (system time) and 2.5s user time for both methods.

It's almost always better to use a high precision timer, as
implemented in the microbenchmark package:

library(microbenchmark)

ssqdif <- function(X, Y=X) {
  #From 'outer' without modification
  Y <- rep(Y, rep.int(length(X), length(Y)))
  X <- rep(X, times = ceiling(length(Y)/length(X)))
  #For this case:
  sum((X-Y)^2) #SLIGHTLY quicker than d<-X-Y; sum(d*d)
}

outerdif <- function(X, Y = X) {
  gg <- outer(X, Y, FUN="-")
  sum(gg*gg)
}

X <- runif(1000)

microbenchmark(
  ssqdif(X),
  outerdif(X)
)

Unit: milliseconds
        expr      min       lq   median       uq      max neval
   ssqdif(X) 9.035473 9.912253 14.65940 16.34044 68.30620   100
 outerdif(X) 8.962955 9.647820 14.85338 17.00048 66.89351   100

Looking at the range of values you can see indeed that the performance
is indeed almost identical.

Hadley

-- 
Chief Scientist, RStudio
http://had.co.nz/



More information about the R-help mailing list