[R] Survey of "moving window" statistical functions - still looking f or fast mad function
Gabor Grothendieck
ggrothendieck at gmail.com
Sat Apr 2 03:40:53 CEST 2005
Jaroslaw's article was great. In fact, it was used as the basis for
rapply and some optimized special cases that will be included in
the R 2.1.0 version of zoo (which have been coded but not yet
released).
Regarding numerically stable summation, check out the idea
behind the following which I coincidentally am also considering
for the zoo implementation:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/393090
On Apr 1, 2005 8:07 PM, Vadim Ogranovich <vograno at evafunds.com> wrote:
> Hi,
>
> First, let me thank Jaroslaw for making this survey. I find it quite
> illuminating.
>
> Now the questions:
>
> * the #1 solution below (based on cumsum) is numerically unstable.
> Specifically if you do the runmean on a positive vector you can easily
> get negative numbers due to rounding errors. Does anyone see a
> modification which is free of this deficiency?
> * is it possible to optimize the algorithm of the filter function,
> solution #2 below, for the case of the rep(1/k,k) kernel?
>
> Thanks,
> Vadim
>
> [R] Survey of "moving window" statistical functions - still looking f or
> fast mad function
>
> * This message: [ Message body
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#start> ] [ More
> options
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#options2> ]
> * Related messages: [ Next message
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5162.html> ] [ Previous
> message <http://tolstoy.newcastle.edu.au/R/help/04/10/5160.html> ] [
> Next in thread <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html>
> ] [ Replies
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#replies> ]
>
> From: Tuszynski, Jaroslaw W. <JAROSLAW.W.TUSZYNSKI_at_saic.com
> <mailto:JAROSLAW.W.TUSZYNSKI_at_saic.com?Subject=Re:%20%5BR%5D%20Survey%
> 20of%20"moving%20window"%20statistical%20functions%20-%20still
> %20lookingf%20or%20fast%20mad%20function> >
> Date: Sat 09 Oct 2004 - 06:30:32 EST
>
> Hi,
>
> Lately I run into a problem that my code R code is spending hours
> performing simple moving window statistical operations. As a result I
> did searched archives for alternative (faster) ways of performing: mean,
> max, median and mad operation over moving window (size 81) on a vector
> with about 30K points. And performed some timing for several ways that
> were suggested, and few ways I come up with. The purpose of this email
> is to share some of my findings and ask for more suggestions (especially
> about moving mad function).
>
> Sum over moving window can be done using many different ways. Here are
> some sorted from the fastest to the slowest:
>
> 1. runmean = function(x, k) { n = length(x) y = x[ k:n ] - x[
> c(1,1:(n-k)) ] # this is a difference from the previous cell y[1] =
> sum(x[1:k]); # find the first sum y = cumsum(y) # apply precomputed
> differences return(y/k) # return mean not sum }
> 2. filter(x, rep(1/k,k), sides=2, circular=T) - (stats package)
> 3. kernapply(x, kernel("daniell", m), circular=T)
> 4. apply(embed(x,k), 1, mean)
> 5. mywinfun <- function(x, k, FUN=mean, ...) { # suggested in news
> group n <- length(x) A <- rep(x, length=k*(n+1)) dim(A) <- c(n+1, k)
> sapply(split(A, row(A)), FUN, ...)[1:(n-k+1)] }
> 6. rollFun(x, k, FUN=mean) - (fSeries package)
> 7. rollMean(x, k) - (fSeries package)
> 8. SimpleMeanLoop = function(x, k) { n = length(x) # simple-minded
> loop used as a baseline y = rep(0, n) k = k%/%2; for (i in (1+k):(n-k))
> y[i] = mean(x[(i-k):(i+k)]) }
> 9. running(x, fun=mean, width=k) - (gtools package)
>
> Some of above functions return results that are the same length as x and
> some return arrays with length n-k+1. The relative speeds (on Windows
> machine) were as follow: 0.01, 0.09, 1.2, 8.1, 11.2, 13.4, 27.3, 63,
> 345. As one can see there are about 5 orders of magnitude between the
> fastest and the slowest.
>
> Maximum over moving window can be done as follow, in order of speed
>
> 1. runmax = function(x, k) { n = length(x) y = rep(0, n) m = k%/%2;
> a = 0; for (i in (1+m):(n-m)) { if (a==y[i-1]) y[i] =
> max(x[(i-m):(i+m)]) # calculate max of the window else y[i] =
> max(y[i-1], x[i+m]); # max of the window is =y[i-1] a = x[i-m] # point
> that will be removed from the window } return(y) }
> 2. apply(embed(x,k), 1, max)
> 3. SimpleMaxLoop(x, k) - similar to SimpleMeanLoop above
> 4. mywinfun(x, k, FUN=max) - see above
> 5. rollFun(x, k, FUN=max) - fSeries package
> 6. rollMax(x, k) - fSeries package
> 7. running(x, fun=max, width=k) - gtools package The relative
> speeds were: <0.01, 3, 3.4, 5.3, 7.5, 7.7, 15.3
>
> Median over moving window can be done as follows:
>
> 1. runmed(x, k) - from stats package
> 2. SimpleMedLoop(x, k) - similar to SimpleMeanLoop above
> 3. apply(embed(x,k), 1, median)
> 4. mywinfun(x, k, FUN=median) - see above
> 5. rollFun (x, k, FUN=median) - fSeries package
> 6. running(x, fun=max, width=k) - gtools package Speeds: <0.01,
> 3.4, 9, 15, 29, 165
>
> Mad over moving window can be done as follows:
>
> 1. runmad = function(x, k) { n = length(x) A = embed(x,k) A = abs(A
> - rep(apply(A, 1, median), k)) dim(A) = c(n-k+1, k) apply(A, 1, median)
> }
> 2. apply(embed(x,k), 1, mad)
> 3. mywinfun(x, k, FUN=mad) - see above
> 4. SimpleMadLoop(x, k) - similar to SimpleMeanLoop above
> 5. rollFun(x, k, FUN=mad) - fSeries package
> 6. running(x, fun=mad, width=k) - gtools package Speeds: 11, 18,
> 25, 50, 50, 400
>
> Some thoughts about those results:
>
> * All functions from Stats package (runmed, filter, kernapply) are
> fast and hard to improve on.
> * In case of Mean and Max a simple un-optimized R codes are much
> faster than specialized functions build for the same purpose.
> * apply(embed(x,k), 1, fun) - seem to be always faster than any
> functions from fSeries package or "mywinfun"
> * "running" function from gtools package is horribly slow compared
> to anything else
> * "mywinfun" proposed as a faster version of "apply(embed(x,k), 1,
> fun)" seems to be always slower
>
> Finally a question: I still need to get moving windows mad
> function faster my "runmad" function is not that much faster than
> apply/embed combo, and that I used before, and this is where my code
> spends most of its time. I need something like "runmed" but for a mad
> function. Any suggestions?
>
> Jarek
>
> =====================================\====
> Jarek Tuszynski, PhD. o / \
> Science Applications International Corporation <\__,|
> (703) 676-4192 "> \
> Jaroslaw.W.Tuszynski at saic.com ` \
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
> guide! http://www.R-project.org/posting-guide.html Received on Sat Oct
> 09 06:43:05 2004
>
> * This message: [ Message body
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#start> ]
> * Next message: Emili Tortosa-Ausina: "Re: [R] RWinEdt"
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5162.html>
> * Previous message: Brian S Cade: "Re: [R] reading Systat into R"
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5160.html>
> * Next in thread: Prof Brian Ripley: "Re: [R] Survey of "moving
> window" statistical functions - still looking f or fast mad function"
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html>
> * Reply: Prof Brian Ripley: "Re: [R] Survey of "moving window"
> statistical functions - still looking f or fast mad function"
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html>
> * Reply: Prof Brian Ripley: "Re: [R] Survey of "moving window"
> statistical functions - still looking f or fast mad function"
> <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html>
>
> * Contemporary messages sorted: [ By Date
> <http://tolstoy.newcastle.edu.au/R/help/04/10/date.html#5161> ] [ By
> Thread <http://tolstoy.newcastle.edu.au/R/help/04/10/index.html#5161> ]
> [ By Subject
> <http://tolstoy.newcastle.edu.au/R/help/04/10/subject.html#5161> ] [ By
> Author <http://tolstoy.newcastle.edu.au/R/help/04/10/author.html#5161>
> ] [ By messages with attachments
> <http://tolstoy.newcastle.edu.au/R/help/04/10/attachment.html> ]
>
> This archive was generated by hypermail 2.1.8
> <http://www.hypermail.org/> : Fri 18 Mar 2005 - 03:03:35 EST
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list