Running Median and Mean
Martin Maechler
Martin Maechler <maechler@stat.math.ethz.ch>
Mon, 16 Sep 2002 18:33:22 +0200
Thank you, David, for keeping the topic alive!
>>>>> "Greg" == Warnes, Gregory R <gregory_r_warnes@groton.pfizer.com>
>>>>> on Mon, 16 Sep 2002 12:13:22 -0400 writes:
Greg> I am willing to add these functions into the gregmisc
Greg> package to complement the already-existing 'running' function.
{David's a package too}
Note that I said originally,
that I have had code for years (available on request) and
planned to add these to "modreg" (one of the standard packages).
Further note that I was talking about two algorithms
1) something along the lines David is using,
where I got the original algorithm from Werner Stuetzle from
in Fortran (from the 80's).
2) the optimal algorithm of Haerdle & Steiner (and Turlach)
which can be another order of magnitude faster.
--> Ref. @article{hae:stei:1995,
Author = {H\"ardle, W. and Steiger, W.},
Title = {{O}ptimal Median Smoothing [{Algorithm AS} 296]},
Year = 1995,
Journal = {Applied Statistics},
Volume = 44,
Pages = {258--264},
}
I do agree I should have published the code much earlier...
Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><
>> -----Original Message----- From: David Brahm
>> [mailto:brahm@alum.mit.edu] Sent: Monday, September 16,
>> 2002 11:10 AM To: r-devel@stat.math.ethz.ch Cc:
>> maechler@stat.math.ethz.ch; Ray.Brownrigg@mcs.vuw.ac.nz
>> Subject: Running Median and Mean
>>
>>
>> R gurus,
>>
>> On Aug 20, 2002, I asked in R-help about calculating a
>> running 5-day median on a large matrix. Thanks to Martin
>> Maechler <maechler@stat.math.ethz.ch> and Ray Brownrigg
>> <Ray.Brownrigg@mcs.vuw.ac.nz> for responding.
>>
>> I ended up writing C code (and an R interface) to do it,
>> which is about 1000x faster than the naive method! (72s
>> became .09s on a 223 x 520 matrix). I added a running
>> mean function too, which is very simple and is about 15x
>> faster than apply(..., filter, ...). I've packaged the
>> code as
>> <http://cran.r-project.org/src/contrib/Devel/runStat_1.1.tar.gz>
>>
>> The key insight was that it *is* in fact a good idea to
>> calculate the ranks of the 5 data points, because then it
>> is easy to calculate how those ranks *change* as you step
>> through the column. For example, suppose the last few
>> points in a column look like: data= 20 25 18 14 78 55 29
>> rank= 3 2 1 5 4 where I have ranked the 5 points prior to
>> the last one. The median is 25 (the point with rank 3).
>> Stepping to the left: data= 20 25 18 14 78 55 29 rank= 3
>> 4 2 1 5 For the 4 points that overlap, their ranks remain
>> the same, except they are incremented by one for points
>> >=20, and decremented by one for points >55. The rank of
>> the new point (20) is just 5 minus the number of
>> increments done.
>>
>> The code is copied below for your amusement. Suggestions
>> welcome. Note I have not dealt with NA's, even values of
>> N, row-wise medians, or non-lagged medians (i.e. I put
>> the median of points 1:5 into position 6, not position
>> 5). It would be great if somebody (Martin?) wanted to
>> incorporate these functions into an existing package or
>> even "base"; otherwise I'll post the final product on
>> CRAN/src/contrib. -- David Brahm (brahm@alum.mit.edu)
>>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._