[R] speeding up "sum of squared differences" calculation

William Dunlap wdunlap at tibco.com
Tue Oct 22 18:37:51 CEST 2013


outer() and dist() are good for speed on smaller problems but they
require O(length(X)^2) memory.  This can slow things down or even
stop the calculations for large problems.  You can gain a lot of speed
without the memory problem by summing things up in chunks.
E.g., On a Linux box I compared
    fChunk <- function(x) {
        ans <- 0
        for(i in seq_along(x)[-1]) {
            ans <- ans + sum( (x[i] - x[seq_len(i-1)])^2)
        }
        2*ans
    }
with the original algorithm (slightly cleaned up)
    fOrig <-function (x) {
        ans <- 0
        for (i in seq_along(x)[-1]) {
            for (j in seq_len(i)) {
                ans <- ans + (x[i] - x[j])^2
            }
        }
        2 * ans
    }
and the ones based on outer() and dist().
   fOuter <- function(x) sum(outer(x, x, FUN=`-`)^2)
   fDist <- function(x) 2 * sum(dist(x)^2)
for a vector of length 10000.

   > z <- rnorm(10^4)
   > t(sapply(list(fOrig=fOrig, fChunk=fChunk, fDist=fDist, fOuter=fOuter),
            function(f)tryCatch(system.time(f(z)), error=function(e)NA)[1:3]))
          user.self sys.self elapsed
   fOrig     90.262    0.000  90.378
   fChunk     0.776    0.004   0.779
   fDist      1.092    0.276   1.370
   fOuter     0.740    1.068   1.855

They all give slightly different answers because the round-off error depends on
the order in which sums and squares are done.

Of course, the Rcpp version has no memory penalty and is faster than any of them.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of S Ellison
> Sent: Tuesday, October 22, 2013 6:08 AM
> To: r-help at r-project.org
> Cc: Ken Knoblauch
> Subject: Re: [R] speeding up "sum of squared differences" calculation
> 
> > Conclusion: hard to beat outer() for this purpose in R
> ... unless you use Ken Knoblauch's suggestion of dist() or Rccp.
> 
> Nice indeed.
> 
> 
> S Ellison
> 
> 
> *******************************************************************
> This email and any attachments are confidential. Any use...{{dropped:8}}
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list