[R] Peak Over Threshold values

William Dunlap wdunlap at tibco.com
Thu May 27 22:13:21 CEST 2010


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap
> Sent: Thursday, May 27, 2010 12:24 PM
> To: Hutchinson,David [PYR]; Tonja Krueger
> Cc: r-help at r-project.org
> Subject: Re: [R] Peak Over Threshold values
> 
> Here is another version, loopless, but still
> a bit clumsy (can the call to which be removed?):

Note that avoiding loops is mostly for
the aesthetic effect.  On my aging laptop
the following loopy and vector-growing version
takes 3 seconds on a million-long vector
while the non-loopy one takes 0.22 seconds
(timings done with plot=FALSE).  That is
a nice ratio but not much of a difference.

f0 <- function (x = walevel,
    startThreshhold = 5.45,
    stopThreshhold = 5.3, 
    plot = TRUE) 
{
    stopifnot(startThreshhold > stopThreshhold)
    inRun <- FALSE
    start <- integer()
    stop <- integer()
    for (i in seq_along(x)) {
        if (inRun) {
            if (x[i] < stopThreshhold) {
                stop[length(stop) + 1] <- i - 1
                inRun <- FALSE
            }
        }
        else {
            if (x[i] > startThreshhold) {
                start[length(start) + 1] <- i
                inRun <- TRUE
            }
        }
    }
    if (inRun) 
        stop[length(stop) + 1] <- length(x)
    if (length(stop) > length(start)) 
        stop <- stop[-1]
    if (plot) {
        plot(x, cex = 0.5)
        abline(h = c(startThreshhold, stopThreshhold))
        abline(v = start, col = "green")
        abline(v = stop, col = "red")
    }
    data.frame(start = start, stop = stop)
}

> 
> f <- function (x = walevel,
>                startThreshhold = 5.45,
>                stopThreshhold = 5.3, 
>                plot = TRUE) 
> {
>     stopifnot(startThreshhold > stopThreshhold)
>     isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)])
>     isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
>     isOverStart <- x >= startThreshhold
>     isOverStop <- x >= stopThreshhold
>     possibleStartPt <- which(isFirstInRun(isOverStart) & isOverStart)
>     possibleStopPt <- which(isLastInRun(isOverStop) & isOverStop)
>     pts <- c(possibleStartPt, possibleStopPt)
>     names(pts) <- rep(c("start", "stop"),
>         c(length(possibleStartPt), length(possibleStopPt)))
>     pts <- pts[order(pts)]
>     tmp <- isFirstInRun(names(pts))
>     start <- pts[tmp & names(pts) == "start"]
>     stop <- pts[tmp & names(pts) == "stop"]
>     if (length(stop) > length(start)) { 
>         # i.e., when first downcrossing happens before
>         # first upcrossing
>         stop <- stop[-1]
>     }
>     if (plot) {
>         plot(x, cex = 0.5)
>         abline(h = c(startThreshhold, stopThreshhold))
>         abline(v = start, col = "green")
>         abline(v = stop, col = "red")
>     }
>     data.frame(start = start, stop = stop)
> }
> 
> # define the data in the original message and call f().
> 
> The isFirstInRun and isLastInRun functions do the
> first part of the calculation that rle does and
> avoid the cumsum(diff()) roundtrip that
> cumsum(rle()$lengths) involves and their names
> make it clearer what I'm trying to do.
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com  
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org 
> > [mailto:r-help-bounces at r-project.org] On Behalf Of 
> > Hutchinson,David [PYR]
> > Sent: Thursday, May 27, 2010 10:41 AM
> > To: Tonja Krueger
> > Cc: r-help at r-project.org
> > Subject: Re: [R] Peak Over Threshold values
> > 
> > Perhaps not elegant, but does the job
> > 
> > threshold <- 5.45
> > meetThreshold <- walevel >= threshold
> > 
> > d <- rle(meetThreshold)
> > startPos <- c(1, 1 + cumsum(d$lengths[-length(d$lengths)]))
> > 
> > abv <- which(d$values == TRUE)
> > p.o.t <- data.frame()
> > 
> > for (i in seq( along = abv ) ) {
> >   a <- startPos[abv[i]]
> >   b <- a + (d$lengths[abv[i]] - 1)
> >   e <- which.max(walevel[a:b])
> >   p.o.t <- rbind( p.o.t, data.frame(
> >                pos = a + e - 1,
> >                val = walevel[a:b][e]
> >            ) )
> > }
> > 
> > plot (
> >   day,
> >   walevel, type = 'l'
> > )
> > 
> > points(
> >   p.o.t$pos,
> >   p.o.t$val,
> >   col = 'red'
> > )
> > 
> > abline(h = threshold, lty = 2, col = 'red')
> > 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org 
> > [mailto:r-help-bounces at r-project.org] On Behalf Of Tonja Krueger
> > Sent: Thursday, May 27, 2010 1:47 AM
> > To: Vito Muggeo (UniPa); Clint Bowman
> > Cc: r-help at r-project.org
> > Subject: [R] Peak Over Threshold values
> > 
> > 
> >    I'm  sorry, but that's not exactly what I was looking for. 
> > I obviously
> >    didn't explain properly:
> > 
> >    Within  my dataframe (df) I would like to find POT values 
> > that are not
> >    linked. In my definition two maximum values are linked if 
> > walevel does not
> >    fall below a certain value (the lower threshold value).
> > 
> >    walevel <- c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 
> > 5.86, 5.91, 5.91,
> >    5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 
> > 6.11, 6.14, 6.12,
> >    6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 
> > 5.72, 5.70, 5.65,
> >    5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 
> > 5.19, 5.19, 5.13,
> >    5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 
> > 5.22, 5.32, 5.29,
> >    5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 
> > 5.66, 5.68, 5.72,
> >    5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 
> > 5.62, 5.62, 5.61,
> >    5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 
> > 5.45, 5.47, 5.50,
> >    5.50, 5.49, 5.43, 5.39, 5.33, 5.26)
> > 
> >    day <- c(1:100)
> > 
> >    df <- data.frame(day,walevel)
> > 
> >    For example in my dataframe  a threshold value = 5.45 and 
> > an lower threshold
> >    value = 5.3 should lead to two maximum values because 
> > between the 2^nd and
> >    3^rd peak walevel does not fall below the lower threshold value.
> > 
> >    With  "clusters (...)" from "evd package", I could find 
> > out POT values but
> >    even though I set a lower threshold value for my example 
> > dataframe I would
> >    get three maximum values instead of two.
> > 
> >    library(evd)
> > 
> >    clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, cmax=T)
> > 
> >    clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, plot=T)
> > 
> >    Changing r to 30 (for example) connects the 2^nd and 3^rd 
> > maximum events and
> >    gives out the right 'end' for the first extreme event but 
> > not for the second
> >    event. (Also I'd rather not use a time interval to prove 
> > that the events are
> >    linked)
> > 
> >    clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, cmax=T)
> > 
> >    clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, plot=T)
> > 
> >    What went wrong???
> > 
> >    Tonja
> >    -----Ursprüngliche Nachricht-----
> >    How about?
> >    hi.rle<-rle(walevel>5.79)
> >    lo.rle<-rle(walevel<5.36)
> >    plot(walevel)
> >    abline(h=5.8,col=2,lty=3)
> >    abline(h=5.35,col=3,lty=3)
> >    hi.lo.rle<-sort(c(cumsum(hi.rle$lengths),cumsum(lo.rle$lengths)))
> >    abline(v=hi.lo.rle)
> >    You can use the $values from the rle to sort things out. Probably
> >    want to ignore the end point at length(walevel).
> >    --
> >    Clint Bowman INTERNET: clint at ecy.wa.gov
> >    Air Quality Modeler INTERNET: clint at math.utah.edu
> >    Department of Ecology VOICE: (360) 407-6815
> >    PO Box 47600 FAX: (360) 407-7534
> >    Olympia, WA 98504-7600
> >    On Wed, 26 May 2010, Vito Muggeo (UniPa) wrote:
> >    > dear Tonja,
> >    > By plotting your data
> >    >
> >    > plot(df)
> >    >
> >    > it seems to me that you are looking for a piecewise 
> > linear relationships.
> >    If
> >    >  this is the case, have a look to the package segmented. 
> > You have to
> >    specify
> >    > or not the number and the starting values for the breakpoints
> >    >
> >    > library(segmented)
> >    > olm<-lm(walevel~day)
> >    >
> >    > #specify number and starting values for the breakpoints..
> >    > oseg<-segmented(olm, seg.Z=~day, psi=c(20,50,80))
> >    > plot(oseg,add=TRUE,col=2)
> >    > oseg$psi
> >    >
> >    > #or you can avoid to specify starting values for psi
> >    > oseg<-segmented(olm, seg.Z=~day,
> >    > psi=NA,control=seg.control(stop.if.error=FALSE,K=30))
> >    >
> >    > plot(oseg,add=TRUE,col=2)
> >    > oseg$psi
> >    >
> >    >
> >    > best,
> >    > vito
> >    >
> >    >
> >    > Tonja Krueger ha scritto:
> >    >> Dear List
> >    >>
> >    >> I hope you can help me: I've got a dataframe (df) 
> > within which I am
> >    >> looking
> >    >> for Peak Over Threshold values as well as the length of 
> > the events. An
> >    >> event
> >    >> starts when walevel equals 5.8 and it should end when 
> > walevel equals
> >    >> the
> >    >> lower threshold value (5.35).
> >    >>
> >    >> I tried "clusters (...)" from "evd package", and varied 
> > r (see example)
> >    >> but it
> >    >> did not work for all events (again see example).
> >    >>
> >    >> walevel <- c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 
> > 5.86, 5.91,
> >    >> 5.91,
> >    >> 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 
> > 6.11, 6.14, 6.12,
> >    >> 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 
> > 5.72, 5.70, 5.65,
> >    >> 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 
> > 5.19, 5.19, 5.13,
> >    >> 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 
> > 5.22, 5.32, 5.29,
> >    >> 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 
> > 5.66, 5.68, 5.72,
> >    >> 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 
> > 5.62, 5.62, 5.61,
> >    >> 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 
> > 5.45, 5.47, 5.50,
> >    >> 5.50, 5.49, 5.43, 5.39, 5.33, 5.26)
> >    >>
> >    >> day <- c(1:100)
> >    >>
> >    >> df <- data.frame(day,walevel)
> >    >>
> >    >> library(evd)
> >    >> clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax 
> > = T, plot = T)
> >    >> clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35, 
> > cmax = T, plot = T)
> >    >>
> >    >> What have I done wrong?
> >    >>
> >    >> Tonja
> >    >> ______________________________________________
> >    >> R-help at r-project.org mailing list
> >    >> [1]https://stat.ethz.ch/mailman/listinfo/r-help
> >    >> PLEASE do read the posting guide
> >    >> [2]http://www.R-project.org/posting-guide.html
> >    >> and provide commented, minimal, self-contained, 
> > reproducible code.
> >    >>
> >    >
> >    >
> > 
> >    GRATIS: Movie-Flat mit über 300 Top-Videos. Für WEB.DE Nutzer
> >    dauerhaft kostenlos! Jetzt freischalten unter 
> > http://movieflat.web.de
> > 
> > References
> > 
> >    1. 
> > file://localhost/../jump.htm?goto=https%3A%2F%2Fstat.ethz.ch%2
> Fmailman%2Flistinfo%2Fr-help
> >    2. 
> > file://localhost/../jump.htm?goto=http%3A%2F%2Fwww.R-project.o
> rg%2Fposting-guide.html
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list