[R] finding peaks in a simple dataset with R

Martin Maechler maechler at stat.math.ethz.ch
Wed Nov 23 17:10:44 CET 2005


>>>>> "Marc" == Marc Kirchner <marc.kirchner at iwr.uni-heidelberg.de>
>>>>>     on Wed, 23 Nov 2005 14:33:28 +0000 writes:

    >> 
    >> I wonder if we shouldn't polish that a bit and add to R's
    >> standard 'utils' package.
    >> 

    Marc> Hm, I figured out there are (at least) two versions out there, one being
    Marc> the "original" idea and a modification. 

    Marc> === Petr Pikal in 2001 (based on Brian Ripley's idea)==
    Marc> peaks <- function(series, span=3) {
    Marc> z <- embed(series, span)
    Marc> result <- max.col(z) == 1 + span %/% 2
    Marc> result
    Marc> }

    Marc> versus

    Marc> === Petr Pikal in 2004 ==
    Marc> peaks2<-function(series,span=3) {
    Marc> z <- embed(series, span)
    Marc> s <- span%/%2
    Marc> v<- max.col(z) == 1 + s
    Marc> result <- c(rep(FALSE,s),v)
    Marc> result <- result[1:(length(result)-s)]
    Marc> result
    Marc> } 

Thank you, Marc,

    Marc> Comparison shows
    >> peaks(c(1,4,1,1,6,1,5,1,1),3)
    Marc> [1]  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE
    Marc> which is a logical vector for elements 2:N-1 and

    >> peaks2(c(1,4,1,1,6,1,5,1,1),3)
    Marc> [1] FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE
    Marc> which is a logical vector for elements 1:N-2.

    Marc> As I would expect to "lose" (span-1)/2 elements on each side 
    Marc> of the vector, to me the 2001 version feels more natural.

I think for the function to be more useful it the result should
have the original vector length and hence I'd propose to also
pad with FALSE at the upper end.

    Marc> Also, both "suffer" from being non-deterministic in the 
    Marc> multiple-maxima-case (the two 4s here)

yes, of course, because of max.col().  
Note that Venables & Ripley would consider this to be rather a feature.

    Marc> This could (should?) be fixed by modifying the call to max.col()
    Marc> result <- max.col(z, "first") == 1 + span %/% 2;

Actually I think it should become an option, but I'd use "first"
as default.

    Marc> Just my two cents,

Thank you.

Here is my current proposal which also demonstrates why it's
useful to pad with FALSE :

peaks <-function(series, span = 3, ties.method = "first") {
    if((span <- as.integer(span)) %% 2 != 1) stop("'span' must be odd")
    z <- embed(series, span)
    s <- span %/% 2
    v <- max.col(z, ties.method = ties.method) == s + 1:1
    pad <- rep(FALSE, s)
    c(pad, v, pad)
}

y <- c(1,4,1,1,6,1,5,1,1) ; (ii <- which(peaks(y))); y[ii]
##- [1] 2 5 7
##- [1] 4 6 5

set.seed(7)
y <- rpois(100, lambda = 7)
py <- peaks(y)
plot(y, type="o", cex = 1/4, main = "y and peaks(y,3)")
points(seq(y)[py], y[py], col = 2, cex = 1.5)

p7 <- peaks(y,7)
points(seq(y)[p7], y[p7], col = 3, cex = 2)
mtext("peaks(y,7)", col=3)




More information about the R-help mailing list