[R] finding peaks in a simple dataset with R
Martin Maechler
maechler at stat.math.ethz.ch
Thu Nov 24 16:06:52 CET 2005
>>>>> "Gabor" == Gabor Grothendieck <ggrothendieck at gmail.com>
>>>>> on Wed, 23 Nov 2005 16:51:17 -0500 writes:
Gabor> One idea might be, rather than have a peaks function, enhance
Gabor> rle so that it optionally produces a third component with the
Gabor> peak information, perhaps 1, 0, -1 for peak, neither and trough.
Gabor> This would avoid any problems with ties since the output of rle is
Gabor> based on runs.
that's an interesting idea. Contributions?
{see also my comments further below}
Gabor> On 11/23/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
>> On 11/23/05, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
>> > >>>>> "Marc" == Marc Kirchner <marc.kirchner at iwr.uni-heidelberg.de>
>> > >>>>> on Wed, 23 Nov 2005 14:33:28 +0000 writes:
>> >
>> > >>
>> > >> I wonder if we shouldn't polish that a bit and add to R's
>> > >> standard 'utils' package.
>> > >>
>> >
>> > Marc> Hm, I figured out there are (at least) two versions out there, one being
>> > Marc> the "original" idea and a modification.
>> >
>> > Marc> === Petr Pikal in 2001 (based on Brian Ripley's idea)==
>> > Marc> peaks <- function(series, span=3) {
>> > Marc> z <- embed(series, span)
>> > Marc> result <- max.col(z) == 1 + span %/% 2
>> > Marc> result
>> > Marc> }
>> >
>> > Marc> versus
>> >
>> > Marc> === Petr Pikal in 2004 ==
>> > Marc> peaks2<-function(series,span=3) {
>> > Marc> z <- embed(series, span)
>> > Marc> s <- span%/%2
>> > Marc> v<- max.col(z) == 1 + s
>> > Marc> result <- c(rep(FALSE,s),v)
>> > Marc> result <- result[1:(length(result)-s)]
>> > Marc> result
>> > Marc> }
>> >
>> > Thank you, Marc,
>> >
>> > Marc> Comparison shows
>> > >> peaks(c(1,4,1,1,6,1,5,1,1),3)
>> > Marc> [1] TRUE FALSE FALSE TRUE FALSE TRUE FALSE
>> > Marc> which is a logical vector for elements 2:N-1 and
>> >
>> > >> peaks2(c(1,4,1,1,6,1,5,1,1),3)
>> > Marc> [1] FALSE TRUE FALSE FALSE TRUE FALSE TRUE
>> > Marc> which is a logical vector for elements 1:N-2.
>> >
>> > Marc> As I would expect to "lose" (span-1)/2 elements on each side
>> > Marc> of the vector, to me the 2001 version feels more natural.
>> >
>> > I think for the function to be more useful it the result should
>> > have the original vector length and hence I'd propose to also
>> > pad with FALSE at the upper end.
>> >
>> > Marc> Also, both "suffer" from being non-deterministic in the
>> > Marc> multiple-maxima-case (the two 4s here)
>> >
>> > yes, of course, because of max.col().
>> > Note that Venables & Ripley would consider this to be rather a feature.
>> >
>> > Marc> This could (should?) be fixed by modifying the call to max.col()
>> > Marc> result <- max.col(z, "first") == 1 + span %/% 2;
>> >
>> > Actually I think it should become an option, but I'd use "first"
>> > as default.
>> >
>> > Marc> Just my two cents,
>> >
>> > Thank you.
>> >
>> > Here is my current proposal which also demonstrates why it's
>> > useful to pad with FALSE :
>> >
>> > peaks <-function(series, span = 3, ties.method = "first") {
>> > if((span <- as.integer(span)) %% 2 != 1) stop("'span' must be odd")
>> > z <- embed(series, span)
>> > s <- span %/% 2
>> > v <- max.col(z, ties.method = ties.method) == s + 1:1
>> > pad <- rep(FALSE, s)
>> > c(pad, v, pad)
>> > }
>> >
>> > y <- c(1,4,1,1,6,1,5,1,1) ; (ii <- which(peaks(y))); y[ii]
>> > ##- [1] 2 5 7
>> > ##- [1] 4 6 5
>> >
>> > set.seed(7)
>> > y <- rpois(100, lambda = 7)
>> > py <- peaks(y)
>> > plot(y, type="o", cex = 1/4, main = "y and peaks(y,3)")
>> > points(seq(y)[py], y[py], col = 2, cex = 1.5)
>> >
>> > p7 <- peaks(y,7)
>> > points(seq(y)[p7], y[p7], col = 3, cex = 2)
>> > mtext("peaks(y,7)", col=3)
>>
>> I think ties are still problems with this approach
>> as:
>>
>> set.seed(1)
[that's not needed for the above peaks()!]
>> peaks( c(1,2,2,2,3), 3 )
>>
>> gives a peak in the 2,2,2 stretch.
Yes, thank you for the example, I've noticed similar behavior in
the plots I gave above.
>> Also NA would seem to be a better pad than false or maybe
>> it should be specifiable including whether there is
>> padding at all.
NA's have the big drawback of producing a logical vector that
can NOT be used for subsetting -- and subsetting was exactly a main
reason for the padding...
I agree however that an option to "not pad" is sensible.
>> The zoo rapply and rollmax functions which can also be used
>> to specify a similar naive peak function have na.pad=
>> and align= arguments. Also any peaks function should
>> be generic so that various time series classes can implement
>> their own methods and in the case of irregularly spaced series
>> note that there are two possibilities for span, the time distance
>> and the number of points, and they are not the same.
>>
>> It might also be nice to be able to get back peaks, troughs or
>> both via 1,0,-1 in the output.
>>
Gabor> ______________________________________________
Gabor> R-help at stat.math.ethz.ch mailing list
Gabor> https://stat.ethz.ch/mailman/listinfo/r-help
Gabor> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list