[R] Peak Over Threshold values
William Dunlap
wdunlap at tibco.com
Thu May 27 22:13:21 CEST 2010
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap
> Sent: Thursday, May 27, 2010 12:24 PM
> To: Hutchinson,David [PYR]; Tonja Krueger
> Cc: r-help at r-project.org
> Subject: Re: [R] Peak Over Threshold values
>
> Here is another version, loopless, but still
> a bit clumsy (can the call to which be removed?):
Note that avoiding loops is mostly for
the aesthetic effect. On my aging laptop
the following loopy and vector-growing version
takes 3 seconds on a million-long vector
while the non-loopy one takes 0.22 seconds
(timings done with plot=FALSE). That is
a nice ratio but not much of a difference.
f0 <- function (x = walevel,
startThreshhold = 5.45,
stopThreshhold = 5.3,
plot = TRUE)
{
stopifnot(startThreshhold > stopThreshhold)
inRun <- FALSE
start <- integer()
stop <- integer()
for (i in seq_along(x)) {
if (inRun) {
if (x[i] < stopThreshhold) {
stop[length(stop) + 1] <- i - 1
inRun <- FALSE
}
}
else {
if (x[i] > startThreshhold) {
start[length(start) + 1] <- i
inRun <- TRUE
}
}
}
if (inRun)
stop[length(stop) + 1] <- length(x)
if (length(stop) > length(start))
stop <- stop[-1]
if (plot) {
plot(x, cex = 0.5)
abline(h = c(startThreshhold, stopThreshhold))
abline(v = start, col = "green")
abline(v = stop, col = "red")
}
data.frame(start = start, stop = stop)
}
>
> f <- function (x = walevel,
> startThreshhold = 5.45,
> stopThreshhold = 5.3,
> plot = TRUE)
> {
> stopifnot(startThreshhold > stopThreshhold)
> isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)])
> isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
> isOverStart <- x >= startThreshhold
> isOverStop <- x >= stopThreshhold
> possibleStartPt <- which(isFirstInRun(isOverStart) & isOverStart)
> possibleStopPt <- which(isLastInRun(isOverStop) & isOverStop)
> pts <- c(possibleStartPt, possibleStopPt)
> names(pts) <- rep(c("start", "stop"),
> c(length(possibleStartPt), length(possibleStopPt)))
> pts <- pts[order(pts)]
> tmp <- isFirstInRun(names(pts))
> start <- pts[tmp & names(pts) == "start"]
> stop <- pts[tmp & names(pts) == "stop"]
> if (length(stop) > length(start)) {
> # i.e., when first downcrossing happens before
> # first upcrossing
> stop <- stop[-1]
> }
> if (plot) {
> plot(x, cex = 0.5)
> abline(h = c(startThreshhold, stopThreshhold))
> abline(v = start, col = "green")
> abline(v = stop, col = "red")
> }
> data.frame(start = start, stop = stop)
> }
>
> # define the data in the original message and call f().
>
> The isFirstInRun and isLastInRun functions do the
> first part of the calculation that rle does and
> avoid the cumsum(diff()) roundtrip that
> cumsum(rle()$lengths) involves and their names
> make it clearer what I'm trying to do.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of
> > Hutchinson,David [PYR]
> > Sent: Thursday, May 27, 2010 10:41 AM
> > To: Tonja Krueger
> > Cc: r-help at r-project.org
> > Subject: Re: [R] Peak Over Threshold values
> >
> > Perhaps not elegant, but does the job
> >
> > threshold <- 5.45
> > meetThreshold <- walevel >= threshold
> >
> > d <- rle(meetThreshold)
> > startPos <- c(1, 1 + cumsum(d$lengths[-length(d$lengths)]))
> >
> > abv <- which(d$values == TRUE)
> > p.o.t <- data.frame()
> >
> > for (i in seq( along = abv ) ) {
> > a <- startPos[abv[i]]
> > b <- a + (d$lengths[abv[i]] - 1)
> > e <- which.max(walevel[a:b])
> > p.o.t <- rbind( p.o.t, data.frame(
> > pos = a + e - 1,
> > val = walevel[a:b][e]
> > ) )
> > }
> >
> > plot (
> > day,
> > walevel, type = 'l'
> > )
> >
> > points(
> > p.o.t$pos,
> > p.o.t$val,
> > col = 'red'
> > )
> >
> > abline(h = threshold, lty = 2, col = 'red')
> >
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of Tonja Krueger
> > Sent: Thursday, May 27, 2010 1:47 AM
> > To: Vito Muggeo (UniPa); Clint Bowman
> > Cc: r-help at r-project.org
> > Subject: [R] Peak Over Threshold values
> >
> >
> > I'm sorry, but that's not exactly what I was looking for.
> > I obviously
> > didn't explain properly:
> >
> > Within my dataframe (df) I would like to find POT values
> > that are not
> > linked. In my definition two maximum values are linked if
> > walevel does not
> > fall below a certain value (the lower threshold value).
> >
> > walevel <- c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85,
> > 5.86, 5.91, 5.91,
> > 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10,
> > 6.11, 6.14, 6.12,
> > 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77,
> > 5.72, 5.70, 5.65,
> > 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19,
> > 5.19, 5.19, 5.13,
> > 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17,
> > 5.22, 5.32, 5.29,
> > 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57,
> > 5.66, 5.68, 5.72,
> > 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66,
> > 5.62, 5.62, 5.61,
> > 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42,
> > 5.45, 5.47, 5.50,
> > 5.50, 5.49, 5.43, 5.39, 5.33, 5.26)
> >
> > day <- c(1:100)
> >
> > df <- data.frame(day,walevel)
> >
> > For example in my dataframe a threshold value = 5.45 and
> > an lower threshold
> > value = 5.3 should lead to two maximum values because
> > between the 2^nd and
> > 3^rd peak walevel does not fall below the lower threshold value.
> >
> > With "clusters (...)" from "evd package", I could find
> > out POT values but
> > even though I set a lower threshold value for my example
> > dataframe I would
> > get three maximum values instead of two.
> >
> > library(evd)
> >
> > clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, cmax=T)
> >
> > clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, plot=T)
> >
> > Changing r to 30 (for example) connects the 2^nd and 3^rd
> > maximum events and
> > gives out the right 'end' for the first extreme event but
> > not for the second
> > event. (Also I'd rather not use a time interval to prove
> > that the events are
> > linked)
> >
> > clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, cmax=T)
> >
> > clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, plot=T)
> >
> > What went wrong???
> >
> > Tonja
> > -----Ursprüngliche Nachricht-----
> > How about?
> > hi.rle<-rle(walevel>5.79)
> > lo.rle<-rle(walevel<5.36)
> > plot(walevel)
> > abline(h=5.8,col=2,lty=3)
> > abline(h=5.35,col=3,lty=3)
> > hi.lo.rle<-sort(c(cumsum(hi.rle$lengths),cumsum(lo.rle$lengths)))
> > abline(v=hi.lo.rle)
> > You can use the $values from the rle to sort things out. Probably
> > want to ignore the end point at length(walevel).
> > --
> > Clint Bowman INTERNET: clint at ecy.wa.gov
> > Air Quality Modeler INTERNET: clint at math.utah.edu
> > Department of Ecology VOICE: (360) 407-6815
> > PO Box 47600 FAX: (360) 407-7534
> > Olympia, WA 98504-7600
> > On Wed, 26 May 2010, Vito Muggeo (UniPa) wrote:
> > > dear Tonja,
> > > By plotting your data
> > >
> > > plot(df)
> > >
> > > it seems to me that you are looking for a piecewise
> > linear relationships.
> > If
> > > this is the case, have a look to the package segmented.
> > You have to
> > specify
> > > or not the number and the starting values for the breakpoints
> > >
> > > library(segmented)
> > > olm<-lm(walevel~day)
> > >
> > > #specify number and starting values for the breakpoints..
> > > oseg<-segmented(olm, seg.Z=~day, psi=c(20,50,80))
> > > plot(oseg,add=TRUE,col=2)
> > > oseg$psi
> > >
> > > #or you can avoid to specify starting values for psi
> > > oseg<-segmented(olm, seg.Z=~day,
> > > psi=NA,control=seg.control(stop.if.error=FALSE,K=30))
> > >
> > > plot(oseg,add=TRUE,col=2)
> > > oseg$psi
> > >
> > >
> > > best,
> > > vito
> > >
> > >
> > > Tonja Krueger ha scritto:
> > >> Dear List
> > >>
> > >> I hope you can help me: I've got a dataframe (df)
> > within which I am
> > >> looking
> > >> for Peak Over Threshold values as well as the length of
> > the events. An
> > >> event
> > >> starts when walevel equals 5.8 and it should end when
> > walevel equals
> > >> the
> > >> lower threshold value (5.35).
> > >>
> > >> I tried "clusters (...)" from "evd package", and varied
> > r (see example)
> > >> but it
> > >> did not work for all events (again see example).
> > >>
> > >> walevel <- c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85,
> > 5.86, 5.91,
> > >> 5.91,
> > >> 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10,
> > 6.11, 6.14, 6.12,
> > >> 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77,
> > 5.72, 5.70, 5.65,
> > >> 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19,
> > 5.19, 5.19, 5.13,
> > >> 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17,
> > 5.22, 5.32, 5.29,
> > >> 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57,
> > 5.66, 5.68, 5.72,
> > >> 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66,
> > 5.62, 5.62, 5.61,
> > >> 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42,
> > 5.45, 5.47, 5.50,
> > >> 5.50, 5.49, 5.43, 5.39, 5.33, 5.26)
> > >>
> > >> day <- c(1:100)
> > >>
> > >> df <- data.frame(day,walevel)
> > >>
> > >> library(evd)
> > >> clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax
> > = T, plot = T)
> > >> clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35,
> > cmax = T, plot = T)
> > >>
> > >> What have I done wrong?
> > >>
> > >> Tonja
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list
> > >> [1]https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> [2]http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained,
> > reproducible code.
> > >>
> > >
> > >
> >
> > GRATIS: Movie-Flat mit über 300 Top-Videos. Für WEB.DE Nutzer
> > dauerhaft kostenlos! Jetzt freischalten unter
> > http://movieflat.web.de
> >
> > References
> >
> > 1.
> > file://localhost/../jump.htm?goto=https%3A%2F%2Fstat.ethz.ch%2
> Fmailman%2Flistinfo%2Fr-help
> > 2.
> > file://localhost/../jump.htm?goto=http%3A%2F%2Fwww.R-project.o
> rg%2Fposting-guide.html
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list