[R] speeding up "sum of squared differences" calculation
William Dunlap
wdunlap at tibco.com
Tue Oct 22 18:37:51 CEST 2013
outer() and dist() are good for speed on smaller problems but they
require O(length(X)^2) memory. This can slow things down or even
stop the calculations for large problems. You can gain a lot of speed
without the memory problem by summing things up in chunks.
E.g., On a Linux box I compared
fChunk <- function(x) {
ans <- 0
for(i in seq_along(x)[-1]) {
ans <- ans + sum( (x[i] - x[seq_len(i-1)])^2)
}
2*ans
}
with the original algorithm (slightly cleaned up)
fOrig <-function (x) {
ans <- 0
for (i in seq_along(x)[-1]) {
for (j in seq_len(i)) {
ans <- ans + (x[i] - x[j])^2
}
}
2 * ans
}
and the ones based on outer() and dist().
fOuter <- function(x) sum(outer(x, x, FUN=`-`)^2)
fDist <- function(x) 2 * sum(dist(x)^2)
for a vector of length 10000.
> z <- rnorm(10^4)
> t(sapply(list(fOrig=fOrig, fChunk=fChunk, fDist=fDist, fOuter=fOuter),
function(f)tryCatch(system.time(f(z)), error=function(e)NA)[1:3]))
user.self sys.self elapsed
fOrig 90.262 0.000 90.378
fChunk 0.776 0.004 0.779
fDist 1.092 0.276 1.370
fOuter 0.740 1.068 1.855
They all give slightly different answers because the round-off error depends on
the order in which sums and squares are done.
Of course, the Rcpp version has no memory penalty and is faster than any of them.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of S Ellison
> Sent: Tuesday, October 22, 2013 6:08 AM
> To: r-help at r-project.org
> Cc: Ken Knoblauch
> Subject: Re: [R] speeding up "sum of squared differences" calculation
>
> > Conclusion: hard to beat outer() for this purpose in R
> ... unless you use Ken Knoblauch's suggestion of dist() or Rccp.
>
> Nice indeed.
>
>
> S Ellison
>
>
> *******************************************************************
> This email and any attachments are confidential. Any use...{{dropped:8}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list