[R] Help optimizing EMD::extrema()

Mike Lawrence Mike.Lawrence at dal.ca
Mon Feb 14 04:07:39 CET 2011


Wow, great work! This makes a huge difference (i.e. 5 mins to process
an electrode instead of 5 hours).

I believe that the ndata and ndatam1 are there to let EMD::emd() and
EMD::extractimf() (the functions that call EMD::extrema()) implement
various boundary corrections. I personally don't intend on examining
the start or end in great detail, so I think the uncorrected version
is fine for my application.

Thanks again!

Mike


On Sun, Feb 13, 2011 at 6:57 PM, William Dunlap <wdunlap at tibco.com> wrote:
> I did not try to emulate the ndata nad ndatam1 arguments
> to extrema(), as I didn't see what they were for.  The
> help file simply says they are the length of the first
> argument and that minus 1 and that is what their default
> values are.  If they do not have their default values
> then extrema() frequently dies.  You could add them them
> to the argument list and not use them, or check that
> they are what they default to, as in
>   function(x, ndata, ndatam1) {
>      stopifnot(length(x)==ndata, ndata-1==ndatam1)
>      ... rest of code ...
> If the check fails then someone needs to say or figure out
> what they are for.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap
>> Sent: Sunday, February 13, 2011 2:08 PM
>> To: Mike Lawrence; r-help at lists.R-project.org
>> Subject: Re: [R] Help optimizing EMD::extrema()
>>
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org
>> > [mailto:r-help-bounces at r-project.org] On Behalf Of Mike Lawrence
>> > Sent: Friday, February 11, 2011 9:28 AM
>> > To: r-help at lists.R-project.org
>> > Subject: [R] Help optimizing EMD::extrema()
>> >
>> > Hi folks,
>> >
>> > I'm attempting to use the EMD package to analyze some neuroimaging
>> > data (timeseries with 64 channels sampled across 1 million
>> time points
>> > within each of 20 people). I found that processing a single
>> channel of
>> > data using EMD::emd() took about 8 hours. Exploration using Rprof()
>> > suggested that most of the compute time was spent in EMD::extrema().
>> > Looking at the code for EMD:extrema(), I managed to find one obvious
>> > speedup (switching from employing rbind() to c()) and I suspect that
>> > there may be a way to further speed things up by pre-allocating all
>> > the objects that are currently being created with c(), but
>> I'm having
>> > trouble understanding the code sufficiently to know
>> when/where to try
>> > this and what sizes to set as the default pre-allocation
>> length. Below
>> > I include code that demonstrates the speedup I achieved by
>> eliminating
>> > calls to rbind(), and also demonstrates that only a few calls to c()
>> > seem to be responsible for most of the compute time. The files
>> > "extrema_c.R" and "extrema_c2.R" are available at:
>> > https://gist.github.com/822691
>>
>> Try the following new.extrema().  It is based on
>> looking at runs in the data in a vectorized way.
>> On my old laptop the running times for length(x)=2^(2:118)
>> with EMD::extrema and new.extrema are
>>        old.time new.time
>> 4          0.00     0.00
>> 8          0.00     0.00
>> 16         0.00     0.00
>> 32         0.00     0.00
>> 64         0.00     0.00
>> 128        0.00     0.00
>> 256        0.00     0.00
>> 512        0.02     0.00
>> 1024       0.03     0.00
>> 2048       0.06     0.01
>> 4096       0.14     0.00
>> 8192       0.37     0.02
>> 16384      1.08     0.03
>> 32768      3.64     0.06
>> 65536     13.35     0.12
>> 131072    48.42     0.25
>> 262144   206.17     0.59
>>
>> isEndOfStrictlyIncreasingRun <- function(x) {
>>     c(FALSE, diff(diff(x) > 0) < 0, FALSE)
>> }
>>
>> isStartOfStrictlyIncreasingRun <- function(x) {
>>     c(FALSE, diff(diff(x) <= 0) < 0, FALSE)
>> }
>>
>> isEndOfStrictlyDecreasingRun <- function(x) {
>>     isEndOfStrictlyIncreasingRun(-x)
>> }
>>
>> isStartOfStrictlyDecreasingRun <- function(x) {
>>     isStartOfStrictlyIncreasingRun(-x)
>> }
>>
>> isStartOfZeroRun <- function(x) {
>>     x==0 & c(TRUE, x[-length(x)]!=0)
>> }
>>
>> nToEndOfCurrentFlatRun <- function(x) {
>>     # for each element of x, how far to end of current
>>     # run of equal values.
>>     rev( sequence(rle(rev(x))$lengths) - 1L)
>> }
>>
>> indexOfEndOfCurrentFlatRun <- function(x) {
>>     # should be a more direct way of doing this, but this is pretty
>> quick
>>     seq_len(length(x)) + nToEndOfCurrentFlatRun(x)
>> }
>>
>> new.extrema <- function(x) {
>>     f <- indexOfEndOfCurrentFlatRun(x)
>>     isMaxStart <- isEndOfStrictlyIncreasingRun(x) &
>> isStartOfStrictlyDecreasingRun(x)[f]
>>     maxindex <- cbind(which(isMaxStart), f[isMaxStart],
>> deparse.level=0)
>>
>>     isMinStart <- isEndOfStrictlyDecreasingRun(x) &
>> isStartOfStrictlyIncreasingRun(x)[f]
>>     minindex <- cbind(which(isMinStart), f[isMinStart],
>> deparse.level=0)
>>
>>
>>     # zero-crossings are bit weird: Report either the
>> before-zero/after-zero
>>     # pair or the start and stop of a a run of zero's (even if the run
>> is
>>     # not part of an actual zero-crossing).  Do them separately then
>> sort.
>>     # Also, if the entire sequence never actually crosses 0, do
>>     # not report the zero-touchings.
>>     # Also, if length(x)==2, the original doesn't report a
>> zero-crossing
>> or
>>     # a zero run.  We do not copy that.
>>     if (max(x) > 0 && min(x) < 0) {
>>         indexOfZeroCrossingStart <- which(c(abs(diff(sign(x)))==2,
>> FALSE))
>>         indexOfZeroCrossingEnd <- indexOfZeroCrossingStart + 1L
>>         indexOfZeroRunStart <- which(isStartOfZeroRun(x))
>>         indexOfZeroRunEnd <- f[indexOfZeroRunStart]
>>         crossStart <- c(indexOfZeroCrossingStart, indexOfZeroRunStart)
>>         cross <- cbind(crossStart, c(indexOfZeroCrossingEnd,
>> indexOfZeroRunEnd), deparse.level=0)[order(crossStart),, drop=FALSE]
>>     } else {
>>         cross <- cbind(integer(), integer())
>>     }
>>     # extrema likes to return NULL instead of a zero-row matrix,
>>     # so we follow it.  Zero-row matrices are easier to deal with.
>>     list(
>>          minindex=if (nrow(minindex)) minindex else NULL,
>>          maxindex=if (nrow(maxindex)) maxindex else NULL,
>>          nextreme=nrow(minindex) + nrow(maxindex),
>>          cross=if(nrow(cross)) cross else NULL,
>>          ncross=nrow(cross) # extrema() returns numeric, not integer,
>> here
>>     )
>> }
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>> >
>> > Any suggestions/help would be greatly appreciated.
>> >
>> >
>> > #load the EMD library for the default version of extrema
>> > library(EMD)
>> >
>> > #some data to process
>> > values = rnorm(1e4)
>> >
>> > #profile the default version of extrema
>> > Rprof(tmp <- tempfile())
>> > temp = extrema(values)
>> > Rprof()
>> > summaryRprof(tmp)
>> > #1.2s total with most time spend doing rbind
>> > unlink(tmp)
>> >
>> > #load a rbind-free version of extrema
>> > source('extrema_c.R')
>> > Rprof(tmp <- tempfile())
>> > temp = extrema_c(values)
>> > Rprof()
>> > summaryRprof(tmp) #much faster! .5s total
>> > unlink(tmp)
>> >
>> > #still, it encounters slowdowns with lots of data
>> > values = rnorm(1e5)
>> > Rprof(tmp <- tempfile())
>> > temp = extrema_c(values)
>> > Rprof()
>> > summaryRprof(tmp)
>> > #44s total, hard to see what's taking up so much time
>> > unlink(tmp)
>> >
>> > #load an rbind-free version of extrema that labels each call to c()
>> > source('extrema_c2.R')
>> > Rprof(tmp <- tempfile())
>> > temp = extrema_c2(values)
>> > Rprof()
>> > summaryRprof(tmp)
>> > #same time as above, but now we see that it spends more time in
>> > certain calls to c() than others
>> > unlink(tmp)
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>



More information about the R-help mailing list