[R] finding peaks in a simple dataset with R

Gabor Grothendieck ggrothendieck at gmail.com
Wed Nov 23 22:51:17 CET 2005


One idea might be, rather than have a peaks function, enhance
rle so that it optionally produces a third component with the
peak information, perhaps 1, 0, -1 for peak, neither and trough.
This would avoid any problems with ties since the output of rle is
based on runs.

On 11/23/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> On 11/23/05, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
> > >>>>> "Marc" == Marc Kirchner <marc.kirchner at iwr.uni-heidelberg.de>
> > >>>>>     on Wed, 23 Nov 2005 14:33:28 +0000 writes:
> >
> >    >>
> >    >> I wonder if we shouldn't polish that a bit and add to R's
> >    >> standard 'utils' package.
> >    >>
> >
> >    Marc> Hm, I figured out there are (at least) two versions out there, one being
> >    Marc> the "original" idea and a modification.
> >
> >    Marc> === Petr Pikal in 2001 (based on Brian Ripley's idea)==
> >    Marc> peaks <- function(series, span=3) {
> >    Marc> z <- embed(series, span)
> >    Marc> result <- max.col(z) == 1 + span %/% 2
> >    Marc> result
> >    Marc> }
> >
> >    Marc> versus
> >
> >    Marc> === Petr Pikal in 2004 ==
> >    Marc> peaks2<-function(series,span=3) {
> >    Marc> z <- embed(series, span)
> >    Marc> s <- span%/%2
> >    Marc> v<- max.col(z) == 1 + s
> >    Marc> result <- c(rep(FALSE,s),v)
> >    Marc> result <- result[1:(length(result)-s)]
> >    Marc> result
> >    Marc> }
> >
> > Thank you, Marc,
> >
> >    Marc> Comparison shows
> >    >> peaks(c(1,4,1,1,6,1,5,1,1),3)
> >    Marc> [1]  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE
> >    Marc> which is a logical vector for elements 2:N-1 and
> >
> >    >> peaks2(c(1,4,1,1,6,1,5,1,1),3)
> >    Marc> [1] FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE
> >    Marc> which is a logical vector for elements 1:N-2.
> >
> >    Marc> As I would expect to "lose" (span-1)/2 elements on each side
> >    Marc> of the vector, to me the 2001 version feels more natural.
> >
> > I think for the function to be more useful it the result should
> > have the original vector length and hence I'd propose to also
> > pad with FALSE at the upper end.
> >
> >    Marc> Also, both "suffer" from being non-deterministic in the
> >    Marc> multiple-maxima-case (the two 4s here)
> >
> > yes, of course, because of max.col().
> > Note that Venables & Ripley would consider this to be rather a feature.
> >
> >    Marc> This could (should?) be fixed by modifying the call to max.col()
> >    Marc> result <- max.col(z, "first") == 1 + span %/% 2;
> >
> > Actually I think it should become an option, but I'd use "first"
> > as default.
> >
> >    Marc> Just my two cents,
> >
> > Thank you.
> >
> > Here is my current proposal which also demonstrates why it's
> > useful to pad with FALSE :
> >
> > peaks <-function(series, span = 3, ties.method = "first") {
> >    if((span <- as.integer(span)) %% 2 != 1) stop("'span' must be odd")
> >    z <- embed(series, span)
> >    s <- span %/% 2
> >    v <- max.col(z, ties.method = ties.method) == s + 1:1
> >    pad <- rep(FALSE, s)
> >    c(pad, v, pad)
> > }
> >
> > y <- c(1,4,1,1,6,1,5,1,1) ; (ii <- which(peaks(y))); y[ii]
> > ##- [1] 2 5 7
> > ##- [1] 4 6 5
> >
> > set.seed(7)
> > y <- rpois(100, lambda = 7)
> > py <- peaks(y)
> > plot(y, type="o", cex = 1/4, main = "y and peaks(y,3)")
> > points(seq(y)[py], y[py], col = 2, cex = 1.5)
> >
> > p7 <- peaks(y,7)
> > points(seq(y)[p7], y[p7], col = 3, cex = 2)
> > mtext("peaks(y,7)", col=3)
>
> I think ties are still problems with this approach
> as:
>
> set.seed(1)
> peaks( c(1,2,2,2,3), 3 )
>
> gives a peak in the 2,2,2 stretch.
>
> Also NA would seem to be a better pad than false or maybe
> it should be specifiable including whether there is padding at
> all.   The zoo rapply and rollmax functions which can also be used
> to specify a similar naive peak function have na.pad=
> and align= arguments.  Also any peaks function should
> be generic so that various time series classes can implement
> their own methods and in the case of irregularly spaced series
> note that there are two possibilities for span, the time distance
> and the number of points, and they are not the same.
>
> It might also be nice to be able to get back peaks, troughs or
> both via  1,0,-1 in the output.
>




More information about the R-help mailing list