[R] lowess + turnpoints = doubling integers?
Philippe Grosjean
phgrosjean at sciviews.org
Thu Jan 2 13:51:41 CET 2003
Hello Bob,
It seems that onsets and offsets are always a multiple of a number A (when
it is a multiple of 11, as in your example, you got 11, 22, 33, 44,..., thus
"double numbers"). Since changing either the length of the series, or f
argument in lowess changes A, it is an artifact introduced by lowess().
Lowess is a local regression method within a given window, and A appears to
be a portion of this window. You should use a different regression method
(depending on your actual data).
Here is a vectorized version of your calculation, with some examples.
Playing with it, I found a bug in turnpoints(), which behave strangely when
successive observations are very close from each other, for instance,
22.12367, 22.12371, ... It sometimes happens after using a lowess, as in the
example. I spotted the problem in the max.col() function that turnpoints()
uses. Probably rounding errors. The Splus version of the function works as
expected (see example here under). I'll correct this as soon as possible,
but in the meantime, you can use turnpoints2() provided hereunder.
Best,
Philippe Grosjean
...........]<(({°<...............<°}))><...............................
) ) ) ) )
( ( ( ( ( Dr. Philippe Grosjean
) ) ) ) )
( ( ( ( ( LOV, UMR 7093
) ) ) ) ) Station Zoologique
( ( ( ( ( Observatoire Océanologique
) ) ) ) ) BP 28
( ( ( ( ( 06234 Villefranche sur mer cedex
) ) ) ) ) France
( ( ( ( (
) ) ) ) ) tel: +33.4.93.76.38.16, fax: +33.4.93.76.38.34
( ( ( ( (
) ) ) ) ) e-mail: phgrosjean at sciviews.org
( ( ( ( ( SciViews project coordinator (http://www.sciviews.org)
) ) ) ) )
.......................................................................
# A vectorized version of your calculation
onset.offset <- function(tp){
if (is.null(class(tp)) || class(tp) != "turnpoints")
stop("tp must be a 'turnpoints' object!")
require(pastecs)
peaks <- tp$pos[tp$peaks]
pits <- tp$pos[tp$pits]
if (tp$firstispeak)
peaks <- peaks[-1] # Start with a pit
npks <- length(peaks)
npts <- length(pits)
if (npks == npts) {
onsets <- peaks - pits
offsets <- pits[-1] - peaks[-npks]
} else {
onsets <- peaks - pits[-npts]
offsets <- pits[-1] - peaks
}
list(peaks=peaks, onsets=onsets, offsets=offsets)
}
# turnpoints is in the PASTECS library
library(pastecs)
# an example that yields multiples of 11
set.seed(c(1,1))
foo <- rnorm(n=1500,mean=23.02, sd=34.14)
xx <- lowess(foo,f=.04)
yy <- turnpoints(xx$y)
onset.offset(yy)
# changing either the length of the series (n), or the window of LOWESS (f)
changes the "periodicity" of onsets/offsets
# A bug is detected in turnpoints (indeed, probably rounding errors in
max.col used by the R version of turnpoints)
set.seed(c(1,1))
foo <- rnorm(n=1800,mean=23.02, sd=34.14)
xx <- lowess(foo,f=.08)
yy <- turnpoints(xx$y)
onset.offset(yy)
# Using Splus version (without max.col) gives correct results:
library(ts)
is.R <- function() FALSE # To use Splus version of the function
set.seed(c(1,1))
foo <- rnorm(n=1800,mean=23.02, sd=34.14)
xx <- lowess(foo,f=.08)
yy <- turnpoints(xx$y)
onset.offset(yy)
remove(is.R) # This is dangerous to keep!
# A temporary workaround for the bug in turnpoints:
turnpoints2 <- function(x){
data <- deparse(substitute(x))
if (is.null(ncol(x)) == FALSE)
stop("Only one series can be treated at a time")
if (exists("is.R") && is.function(is.R) && is.R())
require(ts)
x <- as.vector(x)
n <- length(x)
diffs <- c(x[1] - 1, x[1:(n - 1)]) != x
uniques <- x[diffs]
n2 <- length(uniques)
poss <- (1:n)[diffs]
exaequos <- c(poss[2:n2], n + 1) - poss - 1
if (n2 < 3) {
warning("Less than 3 unique values, no calculation!")
nturns <- NA
firstispeak <- FALSE
peaks <- rep(FALSE, n2)
pits <- rep(FALSE, n2)
tppos <- NA
proba <- NA
info <- NA
}
else {
m <- n2 - 2
ex <- matrix(uniques[1:m + rep(3:1, rep(m, 3)) - 1], m)
peaks <- c(FALSE, apply(ex, 1, max, na.rm = TRUE) == ex[, 2], FALSE)
pits <- c(FALSE, apply(ex, 1, min, na.rm = TRUE) == ex[, 2], FALSE)
tpts <- peaks | pits
if (sum(tpts) == 0) {
nturns <- 0
firstispeak <- FALSE
peaks <- rep(FALSE, n2)
pits <- rep(FALSE, n2)
tppos <- NA
proba <- NA
info <- NA
}
else {
tppos <- (poss + exaequos)[tpts]
tptspos <- (1:n2)[tpts]
firstispeak <- tptspos[1] == (1:n2)[peaks][1]
nturns <- length(tptspos)
if (nturns < 2) {
inter <- n2 + 1
posinter1 <- tptspos[1]
}
else {
inter <- c(tptspos[2:nturns], n2) - c(1, tptspos[1:(nturns -
1)]) + 1
posinter1 <- tptspos - c(1, tptspos[1:(nturns -
1)])
}
posinter2 <- inter - posinter1
posinter <- pmax(posinter1, posinter2)
proba <- 2/(inter * gamma(posinter) * gamma(inter -
posinter + 1))
info <- -log(proba, base = 2)
}
}
res <- list(data = data, n = n, points = uniques, pos = (poss +
exaequos), exaequos = exaequos, nturns = nturns, firstispeak =
firstispeak,
peaks = peaks, pits = pits, tppos = tppos, proba = proba,
info = info)
class(res) <- "turnpoints"
res
}
-----Original Message-----
From: Bob Porter [mailto:rjporter at mindspring.com]
Sent: mardi 31 décembre 2002 1:16
To: phgrosjean at sciviews.org
Subject: Re: [R] lowess + turnpoints = doubling integers?
Hello Philippe:
Thank you for your interest. I have explored this and it appears to happen
with
sequences in which small differences between successive occurances,
somewhere in
turnpoints, but I have not tracked it down exactly. The following is a
cobbled
together demo based on the original code. m b, Bob
***********************************************
foo<-rnorm(n=1200,mean=23.02, sd=34.14) #based on stat of actual data
xx<-lowess(foo,f=.04) #note that changing f or length of foo (equivalent,
actually) changes the result
yy<- turnpoints(xx$y)
onsets<-seq(1:yy$nturns)
peaks<-seq(1:yy$nturns)
offsets<-seq(1:yy$nturns)
ievent<-1
istart<-1
if(yy$firstispeak) istart<-2 #always start with a pit
i<-istart
adj<-1 #dummy adjustment, means something in original code
while(i<=yy$nturns-istart-adj)
{
# points are processed in sets of three pit-peak-pit events,
starting with a pit
onsets[ievent]<-yy$tppos[i+1]-yy$tppos[i]#i is the onset point for
this
event, this calculates its duration
peaks[ievent]<-yy$tppos[i+1] #i+1 is the location of the peak point
offsets[ievent]<-yy$tppos[i+2]-yy$tppos[i+1] #calculate the offset
duration but subtracting peak point from offset point
i<-i+2 #set i to the offset point WHICH IS ALSO THE ONSET POINT FOR
THE
NEXT EVENT
ievent<-ievent+1
#cat(i," ",ievent,"// ")
}
events<-onsets+offsets
cat("\n",events,"events n=", nevents,"\n")
cat(onsets,"onsets\n")
cat(offsets,"offsets\n","\n")
*****************************************
----- Original Message -----
From: "Philippe Grosjean" <phgrosjean at sciviews.org>
To: "Bob Porter" <rjporter at mindspring.com>
Cc: <r-help at stat.math.ethz.ch>
Sent: Tuesday, December 31, 2002 4:55 AM
Subject: RE: [R] lowess + turnpoints = doubling integers?
> Bob,
> Happy New Year. Could you, please, cook an example so as we could spot the
> problem?
> Best,
>
> Philippe Grosjean
>
> ...........]<(({°<...............<°}))><...............................
> ) ) ) ) )
> ( ( ( ( ( Dr. Philippe Grosjean
> ) ) ) ) )
> ( ( ( ( ( LOV, UMR 7093
> ) ) ) ) ) Station Zoologique
> ( ( ( ( ( Observatoire Océanologique
> ) ) ) ) ) BP 28
> ( ( ( ( ( 06234 Villefranche sur mer cedex
> ) ) ) ) ) France
> ( ( ( ( (
> ) ) ) ) ) tel: +33.4.93.76.38.16, fax: +33.4.93.76.38.34
> ( ( ( ( (
> ) ) ) ) ) e-mail: phgrosjean at sciviews.org
> ( ( ( ( ( SciViews project coordinator (http://www.sciviews.org)
> ) ) ) ) )
> .......................................................................
>
>
>
> -----Original Message-----
> From: r-help-admin at stat.math.ethz.ch
> [mailto:r-help-admin at stat.math.ethz.ch]On Behalf Of Bob Porter
> Sent: dimanche 29 décembre 2002 8:52
> To: r-help at stat.math.ethz.ch
> Subject: [R] lowess + turnpoints = doubling integers?
>
>
> Happy New Year, r-helpers!
>
> I am using lowess to smooth a scatter plot,
> xx<-lowess(xinput,f=.04) #defaults for other args
> followed by
> turnpoints(xx$y) #defaults for other args
>
> I plot the smoothed result as well as turnpoints (using yy$tppos) on top
of
> raw
> data plot.
> Result is exactly as expected, graphically.
>
> For another purpose, I calcuate the difference between turnpoints
> (representing
> time intervals between turnspoints in my applicaiton), e.g.
>
> aduration[j]<-yy$tppos[i+1]-yytppos[i]
>
> This also appears to work as expected, HOWEVER, a typical list of such
> differences looks like:
> 22,33,22,11,33,44,33,33,11,33,44,66,33,22,.........
> there are, in some instances, three digit measure such as 110 or 106, etc.
> but
> the double-digit measures, although of the proper magnitude, seem to
always
> be
> double integers 11 to 99.
>
> The double-integer results are found with f=.04 and total length of vector
> of
> 1200. With vector of length 1100 (same input data), the results are NOT
> double-integer but are ALWAYS (?) of the form x0, e.g.
> 70,50,50,70,100,40............ The magnitude of these latter results (22
vs
> 70,
> for example) is evidently due to the change in the smoothing span from
> .04*1200
> to .04*1100, but I am mystified why the results have double-integers in
the
> first case, and terminal zeros in the second.
>
> My project involves applying these functions in a batch run for dozens of
> data
> sets so I want to understand more about what is going on that yeilds these
> strange double-integers or zero-terminating values.
>
> Thanks,
>
> Bob Porter, Tampa
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>
>
>
More information about the R-help
mailing list