[R] Help optimizing EMD::extrema()
Mike Lawrence
Mike.Lawrence at dal.ca
Mon Feb 14 04:07:39 CET 2011
Wow, great work! This makes a huge difference (i.e. 5 mins to process
an electrode instead of 5 hours).
I believe that the ndata and ndatam1 are there to let EMD::emd() and
EMD::extractimf() (the functions that call EMD::extrema()) implement
various boundary corrections. I personally don't intend on examining
the start or end in great detail, so I think the uncorrected version
is fine for my application.
Thanks again!
Mike
On Sun, Feb 13, 2011 at 6:57 PM, William Dunlap <wdunlap at tibco.com> wrote:
> I did not try to emulate the ndata nad ndatam1 arguments
> to extrema(), as I didn't see what they were for. The
> help file simply says they are the length of the first
> argument and that minus 1 and that is what their default
> values are. If they do not have their default values
> then extrema() frequently dies. You could add them them
> to the argument list and not use them, or check that
> they are what they default to, as in
> function(x, ndata, ndatam1) {
> stopifnot(length(x)==ndata, ndata-1==ndatam1)
> ... rest of code ...
> If the check fails then someone needs to say or figure out
> what they are for.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap
>> Sent: Sunday, February 13, 2011 2:08 PM
>> To: Mike Lawrence; r-help at lists.R-project.org
>> Subject: Re: [R] Help optimizing EMD::extrema()
>>
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org
>> > [mailto:r-help-bounces at r-project.org] On Behalf Of Mike Lawrence
>> > Sent: Friday, February 11, 2011 9:28 AM
>> > To: r-help at lists.R-project.org
>> > Subject: [R] Help optimizing EMD::extrema()
>> >
>> > Hi folks,
>> >
>> > I'm attempting to use the EMD package to analyze some neuroimaging
>> > data (timeseries with 64 channels sampled across 1 million
>> time points
>> > within each of 20 people). I found that processing a single
>> channel of
>> > data using EMD::emd() took about 8 hours. Exploration using Rprof()
>> > suggested that most of the compute time was spent in EMD::extrema().
>> > Looking at the code for EMD:extrema(), I managed to find one obvious
>> > speedup (switching from employing rbind() to c()) and I suspect that
>> > there may be a way to further speed things up by pre-allocating all
>> > the objects that are currently being created with c(), but
>> I'm having
>> > trouble understanding the code sufficiently to know
>> when/where to try
>> > this and what sizes to set as the default pre-allocation
>> length. Below
>> > I include code that demonstrates the speedup I achieved by
>> eliminating
>> > calls to rbind(), and also demonstrates that only a few calls to c()
>> > seem to be responsible for most of the compute time. The files
>> > "extrema_c.R" and "extrema_c2.R" are available at:
>> > https://gist.github.com/822691
>>
>> Try the following new.extrema(). It is based on
>> looking at runs in the data in a vectorized way.
>> On my old laptop the running times for length(x)=2^(2:118)
>> with EMD::extrema and new.extrema are
>> old.time new.time
>> 4 0.00 0.00
>> 8 0.00 0.00
>> 16 0.00 0.00
>> 32 0.00 0.00
>> 64 0.00 0.00
>> 128 0.00 0.00
>> 256 0.00 0.00
>> 512 0.02 0.00
>> 1024 0.03 0.00
>> 2048 0.06 0.01
>> 4096 0.14 0.00
>> 8192 0.37 0.02
>> 16384 1.08 0.03
>> 32768 3.64 0.06
>> 65536 13.35 0.12
>> 131072 48.42 0.25
>> 262144 206.17 0.59
>>
>> isEndOfStrictlyIncreasingRun <- function(x) {
>> c(FALSE, diff(diff(x) > 0) < 0, FALSE)
>> }
>>
>> isStartOfStrictlyIncreasingRun <- function(x) {
>> c(FALSE, diff(diff(x) <= 0) < 0, FALSE)
>> }
>>
>> isEndOfStrictlyDecreasingRun <- function(x) {
>> isEndOfStrictlyIncreasingRun(-x)
>> }
>>
>> isStartOfStrictlyDecreasingRun <- function(x) {
>> isStartOfStrictlyIncreasingRun(-x)
>> }
>>
>> isStartOfZeroRun <- function(x) {
>> x==0 & c(TRUE, x[-length(x)]!=0)
>> }
>>
>> nToEndOfCurrentFlatRun <- function(x) {
>> # for each element of x, how far to end of current
>> # run of equal values.
>> rev( sequence(rle(rev(x))$lengths) - 1L)
>> }
>>
>> indexOfEndOfCurrentFlatRun <- function(x) {
>> # should be a more direct way of doing this, but this is pretty
>> quick
>> seq_len(length(x)) + nToEndOfCurrentFlatRun(x)
>> }
>>
>> new.extrema <- function(x) {
>> f <- indexOfEndOfCurrentFlatRun(x)
>> isMaxStart <- isEndOfStrictlyIncreasingRun(x) &
>> isStartOfStrictlyDecreasingRun(x)[f]
>> maxindex <- cbind(which(isMaxStart), f[isMaxStart],
>> deparse.level=0)
>>
>> isMinStart <- isEndOfStrictlyDecreasingRun(x) &
>> isStartOfStrictlyIncreasingRun(x)[f]
>> minindex <- cbind(which(isMinStart), f[isMinStart],
>> deparse.level=0)
>>
>>
>> # zero-crossings are bit weird: Report either the
>> before-zero/after-zero
>> # pair or the start and stop of a a run of zero's (even if the run
>> is
>> # not part of an actual zero-crossing). Do them separately then
>> sort.
>> # Also, if the entire sequence never actually crosses 0, do
>> # not report the zero-touchings.
>> # Also, if length(x)==2, the original doesn't report a
>> zero-crossing
>> or
>> # a zero run. We do not copy that.
>> if (max(x) > 0 && min(x) < 0) {
>> indexOfZeroCrossingStart <- which(c(abs(diff(sign(x)))==2,
>> FALSE))
>> indexOfZeroCrossingEnd <- indexOfZeroCrossingStart + 1L
>> indexOfZeroRunStart <- which(isStartOfZeroRun(x))
>> indexOfZeroRunEnd <- f[indexOfZeroRunStart]
>> crossStart <- c(indexOfZeroCrossingStart, indexOfZeroRunStart)
>> cross <- cbind(crossStart, c(indexOfZeroCrossingEnd,
>> indexOfZeroRunEnd), deparse.level=0)[order(crossStart),, drop=FALSE]
>> } else {
>> cross <- cbind(integer(), integer())
>> }
>> # extrema likes to return NULL instead of a zero-row matrix,
>> # so we follow it. Zero-row matrices are easier to deal with.
>> list(
>> minindex=if (nrow(minindex)) minindex else NULL,
>> maxindex=if (nrow(maxindex)) maxindex else NULL,
>> nextreme=nrow(minindex) + nrow(maxindex),
>> cross=if(nrow(cross)) cross else NULL,
>> ncross=nrow(cross) # extrema() returns numeric, not integer,
>> here
>> )
>> }
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>> >
>> > Any suggestions/help would be greatly appreciated.
>> >
>> >
>> > #load the EMD library for the default version of extrema
>> > library(EMD)
>> >
>> > #some data to process
>> > values = rnorm(1e4)
>> >
>> > #profile the default version of extrema
>> > Rprof(tmp <- tempfile())
>> > temp = extrema(values)
>> > Rprof()
>> > summaryRprof(tmp)
>> > #1.2s total with most time spend doing rbind
>> > unlink(tmp)
>> >
>> > #load a rbind-free version of extrema
>> > source('extrema_c.R')
>> > Rprof(tmp <- tempfile())
>> > temp = extrema_c(values)
>> > Rprof()
>> > summaryRprof(tmp) #much faster! .5s total
>> > unlink(tmp)
>> >
>> > #still, it encounters slowdowns with lots of data
>> > values = rnorm(1e5)
>> > Rprof(tmp <- tempfile())
>> > temp = extrema_c(values)
>> > Rprof()
>> > summaryRprof(tmp)
>> > #44s total, hard to see what's taking up so much time
>> > unlink(tmp)
>> >
>> > #load an rbind-free version of extrema that labels each call to c()
>> > source('extrema_c2.R')
>> > Rprof(tmp <- tempfile())
>> > temp = extrema_c2(values)
>> > Rprof()
>> > summaryRprof(tmp)
>> > #same time as above, but now we see that it spends more time in
>> > certain calls to c() than others
>> > unlink(tmp)
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
More information about the R-help
mailing list