[R] Help optimizing EMD::extrema()
Mike Lawrence
Mike.Lawrence at dal.ca
Fri Feb 11 18:27:40 CET 2011
Hi folks,
I'm attempting to use the EMD package to analyze some neuroimaging
data (timeseries with 64 channels sampled across 1 million time points
within each of 20 people). I found that processing a single channel of
data using EMD::emd() took about 8 hours. Exploration using Rprof()
suggested that most of the compute time was spent in EMD::extrema().
Looking at the code for EMD:extrema(), I managed to find one obvious
speedup (switching from employing rbind() to c()) and I suspect that
there may be a way to further speed things up by pre-allocating all
the objects that are currently being created with c(), but I'm having
trouble understanding the code sufficiently to know when/where to try
this and what sizes to set as the default pre-allocation length. Below
I include code that demonstrates the speedup I achieved by eliminating
calls to rbind(), and also demonstrates that only a few calls to c()
seem to be responsible for most of the compute time. The files
"extrema_c.R" and "extrema_c2.R" are available at:
https://gist.github.com/822691
Any suggestions/help would be greatly appreciated.
#load the EMD library for the default version of extrema
library(EMD)
#some data to process
values = rnorm(1e4)
#profile the default version of extrema
Rprof(tmp <- tempfile())
temp = extrema(values)
Rprof()
summaryRprof(tmp)
#1.2s total with most time spend doing rbind
unlink(tmp)
#load a rbind-free version of extrema
source('extrema_c.R')
Rprof(tmp <- tempfile())
temp = extrema_c(values)
Rprof()
summaryRprof(tmp) #much faster! .5s total
unlink(tmp)
#still, it encounters slowdowns with lots of data
values = rnorm(1e5)
Rprof(tmp <- tempfile())
temp = extrema_c(values)
Rprof()
summaryRprof(tmp)
#44s total, hard to see what's taking up so much time
unlink(tmp)
#load an rbind-free version of extrema that labels each call to c()
source('extrema_c2.R')
Rprof(tmp <- tempfile())
temp = extrema_c2(values)
Rprof()
summaryRprof(tmp)
#same time as above, but now we see that it spends more time in
certain calls to c() than others
unlink(tmp)
More information about the R-help
mailing list