[R] Peak Over Threshold values
Tonja Krueger
tonja.krueger at web.de
Mon May 31 11:03:28 CEST 2010
Thanks a lot for your help. That’s the time period I was looking for.
I’ve got one more question: for further analyses I need the respective maximum values
within these time periods (between the green and red lines). Preferably in combination
with the date the maximum event happened.
Thank you
in advance,
Tonja
-----Ursprüngliche Nachricht-----
Von: William Dunlap <wdunlap at tibco.com>
Gesendet: 27.05.2010 22:13:21
An: "Hutchinson,David [PYR]" <David.Hutchinson at ec.gc.ca>,Tonja Krueger <tonja.krueger at web.de>
Betreff: RE: [R] Peak Over Threshold values
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap
>> Sent: Thursday, May 27, 2010 12:24 PM
>> To: Hutchinson,David [PYR]; Tonja Krueger
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Peak Over Threshold values
>>
>> Here is another version, loopless, but still
>> a bit clumsy (can the call to which be removed?):
>
>Note that avoiding loops is mostly for
>the aesthetic effect. On my aging laptop
>the following loopy and vector-growing version
>takes 3 seconds on a million-long vector
>while the non-loopy one takes 0.22 seconds
>(timings done with plot=FALSE). That is
>a nice ratio but not much of a difference.
>
>f0 <- function (x = walevel,
> startThreshhold = 5.45,
> stopThreshhold = 5.3,
> plot = TRUE)
>{
> stopifnot(startThreshhold > stopThreshhold)
> inRun <- FALSE
> start <- integer()
> stop <- integer()
> for (i in seq_along(x)) {
> if (inRun) {
> if (x[i] < stopThreshhold) {
> stop[length(stop) + 1] <- i - 1
> inRun <- FALSE
> }
> }
> else {
> if (x[i] > startThreshhold) {
> start[length(start) + 1] <- i
> inRun <- TRUE
> }
> }
> }
> if (inRun)
> stop[length(stop) + 1] <- length(x)
> if (length(stop) > length(start))
> stop <- stop[-1]
> if (plot) {
> plot(x, cex = 0.5)
> abline(h = c(startThreshhold, stopThreshhold))
> abline(v = start, col = "green")
> abline(v = stop, col = "red")
> }
> data.frame(start = start, stop = stop)
>}
>
>>
>> f <- function (x = walevel,
>> startThreshhold = 5.45,
>> stopThreshhold = 5.3,
>> plot = TRUE)
>> {
>> stopifnot(startThreshhold > stopThreshhold)
>> isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)])
>> isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
>> isOverStart <- x >= startThreshhold
>> isOverStop <- x >= stopThreshhold
>> possibleStartPt <- which(isFirstInRun(isOverStart) & isOverStart)
>> possibleStopPt <- which(isLastInRun(isOverStop) & isOverStop)
>> pts <- c(possibleStartPt, possibleStopPt)
>> names(pts) <- rep(c("start", "stop"),
>> c(length(possibleStartPt), length(possibleStopPt)))
>> pts <- pts[order(pts)]
>> tmp <- isFirstInRun(names(pts))
>> start <- pts[tmp & names(pts) == "start"]
>> stop <- pts[tmp & names(pts) == "stop"]
>> if (length(stop) > length(start)) {
>> # i.e., when first downcrossing happens before
>> # first upcrossing
>> stop <- stop[-1]
>> }
>> if (plot) {
>> plot(x, cex = 0.5)
>> abline(h = c(startThreshhold, stopThreshhold))
>> abline(v = start, col = "green")
>> abline(v = stop, col = "red")
>> }
>> data.frame(start = start, stop = stop)
>> }
>>
>> # define the data in the original message and call f().
>>
>> The isFirstInRun and isLastInRun functions do the
>> first part of the calculation that rle does and
>> avoid the cumsum(diff()) roundtrip that
>> cumsum(rle()$lengths) involves and their names
>> make it clearer what I'm trying to do.
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org
>> > [mailto:r-help-bounces at r-project.org] On Behalf Of
>> > Hutchinson,David [PYR]
>> > Sent: Thursday, May 27, 2010 10:41 AM
>> > To: Tonja Krueger
>> > Cc: r-help at r-project.org
>> > Subject: Re: [R] Peak Over Threshold values
>> >
>> > Perhaps not elegant, but does the job
>> >
>> > threshold <- 5.45
>> > meetThreshold <- walevel >= threshold
>> >
>> > d <- rle(meetThreshold)
>> > startPos <- c(1, 1 + cumsum(d$lengths[-length(d$lengths)]))
>> >
>> > abv <- which(d$values == TRUE)
>> > p.o.t <- data.frame()
>> >
>> > for (i in seq( along = abv ) ) {
>> > a <- startPos[abv[i]]
>> > b <- a + (d$lengths[abv[i]] - 1)
>> > e <- which.max(walevel[a:b])
>> > p.o.t <- rbind( p.o.t, data.frame(
>> > pos = a + e - 1,
>> > val = walevel[a:b][e]
>> > ) )
>> > }
>> >
>> > plot (
>> > day,
>> > walevel, type = 'l'
>> > )
>> >
>> > points(
>> > p.o.t$pos,
>> > p.o.t$val,
>> > col = 'red'
>> > )
>> >
>> > abline(h = threshold, lty = 2, col = 'red')
>> >
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org
>> > [mailto:r-help-bounces at r-project.org] On Behalf Of Tonja Krueger
>> > Sent: Thursday, May 27, 2010 1:47 AM
>> > To: Vito Muggeo (UniPa); Clint Bowman
>> > Cc: r-help at r-project.org
>> > Subject: [R] Peak Over Threshold values
>> >
>> >
>> > I'm sorry, but that's not exactly what I was looking for.
>> > I obviously
>> > didn't explain properly:
>> >
>> > Within my dataframe (df) I would like to find POT values
>> > that are not
>> > linked. In my definition two maximum values are linked if
>> > walevel does not
>> > fall below a certain value (the lower threshold value).
>> >
>> > walevel <- c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85,
>> > 5.86, 5.91, 5.91,
>> > 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10,
>> > 6.11, 6.14, 6.12,
>> > 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77,
>> > 5.72, 5.70, 5.65,
>> > 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19,
>> > 5.19, 5.19, 5.13,
>> > 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17,
>> > 5.22, 5.32, 5.29,
>> > 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57,
>> > 5.66, 5.68, 5.72,
>> > 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66,
>> > 5.62, 5.62, 5.61,
>> > 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42,
>> > 5.45, 5.47, 5.50,
>> > 5.50, 5.49, 5.43, 5.39, 5.33, 5.26)
>> >
>> > day <- c(1:100)
>> >
>> > df <- data.frame(day,walevel)
>> >
>> > For example in my dataframe a threshold value = 5.45 and
>> > an lower threshold
>> > value = 5.3 should lead to two maximum values because
>> > between the 2^nd and
>> > 3^rd peak walevel does not fall below the lower threshold value.
>> >
>> > With "clusters (...)" from "evd package", I could find
>> > out POT values but
>> > even though I set a lower threshold value for my example
>> > dataframe I would
>> > get three maximum values instead of two.
>> >
>> > library(evd)
>> >
>> > clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, cmax=T)
>> >
>> > clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, plot=T)
>> >
>> > Changing r to 30 (for example) connects the 2^nd and 3^rd
>> > maximum events and
>> > gives out the right 'end' for the first extreme event but
>> > not for the second
>> > event. (Also I'd rather not use a time interval to prove
>> > that the events are
>> > linked)
>> >
>> > clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, cmax=T)
>> >
>> > clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, plot=T)
>> >
>> > What went wrong???
>> >
>> > Tonja
>> > -----Ursprüngliche Nachricht-----
>> > How about?
>> > hi.rle<-rle(walevel>5.79)
>> > lo.rle<-rle(walevel<5.36)
>> > plot(walevel)
>> > abline(h=5.8,col=2,lty=3)
>> > abline(h=5.35,col=3,lty=3)
>> > hi.lo.rle<-sort(c(cumsum(hi.rle$lengths),cumsum(lo.rle$lengths)))
>> > abline(v=hi.lo.rle)
>> > You can use the $values from the rle to sort things out. Probably
>> > want to ignore the end point at length(walevel).
>> > --
>> > Clint Bowman INTERNET: clint at ecy.wa.gov
>> > Air Quality Modeler INTERNET: clint at math.utah.edu
>> > Department of Ecology VOICE: (360) 407-6815
>> > PO Box 47600 FAX: (360) 407-7534
>> > Olympia, WA 98504-7600
>> > On Wed, 26 May 2010, Vito Muggeo (UniPa) wrote:
>> > > dear Tonja,
>> > > By plotting your data
>> > >
>> > > plot(df)
>> > >
>> > > it seems to me that you are looking for a piecewise
>> > linear relationships.
>> > If
>> > > this is the case, have a look to the package segmented.
>> > You have to
>> > specify
>> > > or not the number and the starting values for the breakpoints
>> > >
>> > > library(segmented)
>> > > olm<-lm(walevel~day)
>> > >
>> > > #specify number and starting values for the breakpoints..
>> > > oseg<-segmented(olm, seg.Z=~day, psi=c(20,50,80))
>> > > plot(oseg,add=TRUE,col=2)
>> > > oseg$psi
>> > >
>> > > #or you can avoid to specify starting values for psi
>> > > oseg<-segmented(olm, seg.Z=~day,
>> > > psi=NA,control=seg.control(stop.if.error=FALSE,K=30))
>> > >
>> > > plot(oseg,add=TRUE,col=2)
>> > > oseg$psi
>> > >
>> > >
>> > > best,
>> > > vito
>> > >
>> > >
>> > > Tonja Krueger ha scritto:
>> > >> Dear List
>> > >>
>> > >> I hope you can help me: I've got a dataframe (df)
>> > within which I am
>> > >> looking
>> > >> for Peak Over Threshold values as well as the length of
>> > the events. An
>> > >> event
>> > >> starts when walevel equals 5.8 and it should end when
>> > walevel equals
>> > >> the
>> > >> lower threshold value (5.35).
>> > >>
>> > >> I tried "clusters (...)" from "evd package", and varied
>> > r (see example)
>> > >> but it
>> > >> did not work for all events (again see example).
>> > >>
>> > >> walevel <- c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85,
>> > 5.86, 5.91,
>> > >> 5.91,
>> > >> 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10,
>> > 6.11, 6.14, 6.12,
>> > >> 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77,
>> > 5.72, 5.70, 5.65,
>> > >> 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19,
>> > 5.19, 5.19, 5.13,
>> > >> 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17,
>> > 5.22, 5.32, 5.29,
>> > >> 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57,
>> > 5.66, 5.68, 5.72,
>> > >> 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66,
>> > 5.62, 5.62, 5.61,
>> > >> 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42,
>> > 5.45, 5.47, 5.50,
>> > >> 5.50, 5.49, 5.43, 5.39, 5.33, 5.26)
>> > >>
>> > >> day <- c(1:100)
>> > >>
>> > >> df <- data.frame(day,walevel)
>> > >>
>> > >> library(evd)
>> > >> clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax
>> > = T, plot = T)
>> > >> clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35,
>> > cmax = T, plot = T)
>> > >>
>> > >> What have I done wrong?
>> > >>
>> > >> Tonja
>> > >> ______________________________________________
>> > >> R-help at r-project.org mailing list
>> > >> [1]https://stat.ethz.ch/mailman/listinfo/r-help
>> > >> PLEASE do read the posting guide
>> > >> [2]http://www.R-project.org/posting-guide.html
>> > >> and provide commented, minimal, self-contained,
>> > reproducible code.
>> > >>
>> > >
>> > >
>> >
>> > GRATIS: Movie-Flat mit über 300 Top-Videos. Für WEB.DE Nutzer
>> > dauerhaft kostenlos! Jetzt freischalten unter
>> > http://movieflat.web.de
>> >
>> > References
>> >
>> > 1.
>> > file://localhost/../jump.htm?goto=https%3A%2F%2Fstat.ethz.ch%2
>> Fmailman%2Flistinfo%2Fr-help
>> > 2.
>> > file://localhost/../jump.htm?goto=http%3A%2F%2Fwww.R-project.o
>> rg%2Fposting-guide.html
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
___________________________________________________________
GRATIS für alle WEB.DE Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de
More information about the R-help
mailing list