[R] finding peaks in a simple dataset with R
Philippe Grosjean
phgrosjean at sciviews.org
Sun Nov 27 07:51:55 CET 2005
Hello,
This is interesting! Note that you have the turnpoints() function in
pastecs library. The function calculates all peaks and pits in a series.
It also calculates information of each turning point (peak or pit),
according to Kendall's information theory. It is then possible to filter
these peaks or pits at a certain level of significance. The later
function is not implemented in the pastecs package, but it can be done
easily (that one retreives ony peaks):
Peaks <- function(x, level = 0.05) {
if (!inherits(x, "turnpoints"))
stop("x must be a 'turnpoints' object!")
# Extract position and probability
tp.pos <- x$tppos
tp.proba <- x$proba
# We have both peaks and pits. Keep only peaks
keep <- 1:(x$nturns / 2) * 2
if (x$firstispeak) keep <- keep - 1
tp.pos <- tp.pos[keep]
tp.proba <- tp.proba[keep]
# Keep only peaks whose probability is lower than level
return(tp.pos[tp.proba < level])
}
# Now, your examples using turnpoints()
library(pastecs)
set.seed(7)
y <- rpois(100, lambda = 7)
tp <- turnpoints(y)
summary(tp)
# Take all peaks
p.all <- Peaks(tp, level = 1)
plot(y, type="o", cex = 1/4, main = "y and all peaks")
points(seq(y)[p.all], y[p.all], col = 2, cex = 1.5)
# Take only most significative ones
p.50 <- Peaks(tp, level = 0.5)
points(seq(y)[p.50], y[p.50], col = 3, cex = 2)
mtext("peaks at level = 50%", col = 3)
# Note that the result is very different than yours. Here, it is not
# the elevation of the peak that is a criterion, but whether you have
# several observations lower than the peak around it.
set.seed(2)
x <- round(rnorm(500), 2)
y <- cumsum(x)
plot(y, type="o", cex = 1/4)
p.1 <- Peaks(turnpoints(y), level = 0.01)
points(seq(y)[p.1], y[p.1], col = 3, cex = 2)
mtext("peaks al level = 1%", col=3)
You could read more about Kendall's information theory in the pastecs
manual located in /library/pastecs/doc/pastecs.pdf (pp 106-110, in
French) and cited references (quite old). Note that you do not
necessarily get the highest peaks in the series, because the criterion
here is whether you have several lower observations around a given peak
in the series, or not... not the absolute height of the peaks in a
moving window, as you do. Kendall's test makes sense for a time series
where you can have white noise on top of your signal: some of the
highest peaks my simply be positive random variations on top of values
that are not peaks in the general trend.
Now, if you want to retrieve the peaks the way you do from the
turnpoints object, it is relatively straightforward to do so from the
turnpoints object because *all* peaks and pits are returned in it (just
look whether each peak is the largest one in a given window.
A quick check for speed shows that your function is slightly faster than
turnpoints, but this is not considering that turnpoints also calculates
probability and quantity of information related to all peaks and pits.
On my machine (PIV 3Ghz), the normal distribution example with 1.000.000
data points took 32 seconds with your function and 44 seconds with
turnpoints... not a major difference, isn't it?
Best,
Philippe Grosjean
..............................................<°}))><........
) ) ) ) )
( ( ( ( ( Prof. Philippe Grosjean
) ) ) ) )
( ( ( ( ( Numerical Ecology of Aquatic Systems
) ) ) ) ) Mons-Hainaut University, Pentagone (3D08)
( ( ( ( ( Academie Universitaire Wallonie-Bruxelles
) ) ) ) ) 8, av du Champ de Mars, 7000 Mons, Belgium
( ( ( ( (
) ) ) ) ) phone: + 32.65.37.34.97, fax: + 32.65.37.30.54
( ( ( ( ( email: Philippe.Grosjean at umh.ac.be
) ) ) ) )
( ( ( ( ( web: http://www.umh.ac.be/~econum
) ) ) ) ) http://www.sciviews.org
( ( ( ( (
..............................................................
Dylan Beaudette wrote:
> On Nov 25, 2005, at 10:27 AM, Martin Maechler wrote:
>
>
>>Let me try to summarize my view on this:
>>
>>- I still it would make sense to have a *simple* peaks() function
>> in R which provides the same (or more) functionality as the
>> corresponding S-plus one.From
>> For a proper data analysis situation, I think one would have to
>> do something more sophisticated, based on a model (with a random
>> component), such as nonparametric regression, time-series,....
>> Hence peaks() should be kept as simple as reasonable.
>>
>>- Of course I know that which() or %in% can be used to deal
>> with logicals containing NAs {As a matter of fact, I've had
>> my fingers in both implementations for R!}.
>> Still, the main use of logical vectors in S often is for
>> situations where NAs only appear because of missing data:
>>
>> Indexing ([]), all(), any(), sum() are all very nice and
>> useful for logical vectors particularly when there are no NAs.
>>
>>- I agree that a different more flexible function returning
>> values from {-1,0,1} would be desirable, "for symmetry reasons".
>> ===> added a peaksign() function
>>
>>Here's code that implements the above {and other concerns
>>mentioned in this thread}, including some ``consistency
>>checking'' :
>>
>>peaks <- function(series, span = 3, do.pad = TRUE) {
>> if((span <- as.integer(span)) %% 2 != 1) stop("'span' must be odd")
>> s1 <- 1:1 + (s <- span %/% 2)
>> if(span == 1) return(rep.int(TRUE, length(series)))
>> z <- embed(series, span)
>> v <- apply(z[,s1] > z[, -s1, drop=FALSE], 1, all)
>> if(do.pad) {
>> pad <- rep.int(FALSE, s)
>> c(pad, v, pad)
>> } else v
>>}
>>
>>peaksign <- function(series, span = 3, do.pad = TRUE)
>>{
>> ## Purpose: return (-1 / 0 / 1) if series[i] is ( trough /
>>"normal" / peak )
>> ##
>>----------------------------------------------------------------------
>> ## Author: Martin Maechler, Date: 25 Nov 2005
>>
>> if((span <- as.integer(span)) %% 2 != 1 || span == 1)
>> stop("'span' must be odd and >= 3")
>> s1 <- 1:1 + (s <- span %/% 2)
>> z <- embed(series, span)
>> d <- z[,s1] - z[, -s1, drop=FALSE]
>> ans <- rep.int(0:0, nrow(d))
>> ans[apply(d > 0, 1, all)] <- as.integer(1)
>> ans[apply(d < 0, 1, all)] <- as.integer(-1)
>> if(do.pad) {
>> pad <- rep.int(0:0, s)
>> c(pad, ans, pad)
>> } else ans
>>}
>>
>>
>>check.pks <- function(y, span = 3)
>> stopifnot(identical(peaks( y, span), peaksign(y, span) == 1),
>> identical(peaks(-y, span), peaksign(y, span) == -1))
>>
>>for(y in list(1:10, rep(1,10), c(11,2,2,3,4,4,6,6,6))) {
>> for(sp in c(3,5,7))
>> check.pks(y, span = sp)
>> stopifnot(peaksign(y) == 0)
>>}
>>
>>y <- c(1,4,1,1,6,1,5,1,1) ; (ii <- which(peaks(y))); y[ii]
>>##- [1] 2 5 7
>>##- [1] 4 6 5
>>check.pks(y)
>>
>>set.seed(7)
>>y <- rpois(100, lambda = 7)
>>check.pks(y)
>>py <- peaks(y)
>>plot(y, type="o", cex = 1/4, main = "y and peaks(y,3)")
>>points(seq(y)[py], y[py], col = 2, cex = 1.5)
>>
>>p7 <- peaks(y,7)
>>points(seq(y)[p7], y[p7], col = 3, cex = 2)
>>mtext("peaks(y,7)", col=3)
>>
>>set.seed(2)
>>x <- round(rnorm(500), 2)
>>y <- cumsum(x)
>>check.pks(y)
>>
>>plot(y, type="o", cex = 1/4)
>>p15 <- peaks(y,15)
>>points(seq(y)[p15], y[p15], col = 3, cex = 2)
>>mtext("peaks(y,15)", col=3)
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide!
>>http://www.R-project.org/posting-guide.html
>>
>>
>
>
> Wow! This was the exact sort of simple peak finding algorithm I was
> looking for. Would it be ok for me to post this to our dept. webpage so
> that others may use it?
>
> Cheers,
>
> --
> Dylan Beaudette
> Soils and Biogeochemistry Graduate Group
> University of California at Davis
> 530.754.7341
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list