[R] finding peaks in a simple dataset with R
Martin Maechler
maechler at stat.math.ethz.ch
Wed Nov 23 17:10:44 CET 2005
>>>>> "Marc" == Marc Kirchner <marc.kirchner at iwr.uni-heidelberg.de>
>>>>> on Wed, 23 Nov 2005 14:33:28 +0000 writes:
>>
>> I wonder if we shouldn't polish that a bit and add to R's
>> standard 'utils' package.
>>
Marc> Hm, I figured out there are (at least) two versions out there, one being
Marc> the "original" idea and a modification.
Marc> === Petr Pikal in 2001 (based on Brian Ripley's idea)==
Marc> peaks <- function(series, span=3) {
Marc> z <- embed(series, span)
Marc> result <- max.col(z) == 1 + span %/% 2
Marc> result
Marc> }
Marc> versus
Marc> === Petr Pikal in 2004 ==
Marc> peaks2<-function(series,span=3) {
Marc> z <- embed(series, span)
Marc> s <- span%/%2
Marc> v<- max.col(z) == 1 + s
Marc> result <- c(rep(FALSE,s),v)
Marc> result <- result[1:(length(result)-s)]
Marc> result
Marc> }
Thank you, Marc,
Marc> Comparison shows
>> peaks(c(1,4,1,1,6,1,5,1,1),3)
Marc> [1] TRUE FALSE FALSE TRUE FALSE TRUE FALSE
Marc> which is a logical vector for elements 2:N-1 and
>> peaks2(c(1,4,1,1,6,1,5,1,1),3)
Marc> [1] FALSE TRUE FALSE FALSE TRUE FALSE TRUE
Marc> which is a logical vector for elements 1:N-2.
Marc> As I would expect to "lose" (span-1)/2 elements on each side
Marc> of the vector, to me the 2001 version feels more natural.
I think for the function to be more useful it the result should
have the original vector length and hence I'd propose to also
pad with FALSE at the upper end.
Marc> Also, both "suffer" from being non-deterministic in the
Marc> multiple-maxima-case (the two 4s here)
yes, of course, because of max.col().
Note that Venables & Ripley would consider this to be rather a feature.
Marc> This could (should?) be fixed by modifying the call to max.col()
Marc> result <- max.col(z, "first") == 1 + span %/% 2;
Actually I think it should become an option, but I'd use "first"
as default.
Marc> Just my two cents,
Thank you.
Here is my current proposal which also demonstrates why it's
useful to pad with FALSE :
peaks <-function(series, span = 3, ties.method = "first") {
if((span <- as.integer(span)) %% 2 != 1) stop("'span' must be odd")
z <- embed(series, span)
s <- span %/% 2
v <- max.col(z, ties.method = ties.method) == s + 1:1
pad <- rep(FALSE, s)
c(pad, v, pad)
}
y <- c(1,4,1,1,6,1,5,1,1) ; (ii <- which(peaks(y))); y[ii]
##- [1] 2 5 7
##- [1] 4 6 5
set.seed(7)
y <- rpois(100, lambda = 7)
py <- peaks(y)
plot(y, type="o", cex = 1/4, main = "y and peaks(y,3)")
points(seq(y)[py], y[py], col = 2, cex = 1.5)
p7 <- peaks(y,7)
points(seq(y)[p7], y[p7], col = 3, cex = 2)
mtext("peaks(y,7)", col=3)
More information about the R-help
mailing list