[R] Classifying time series by shape over time

Wed Mar 22 11:55:14 CET 2006

Hi,

turnpoints() in library(pastecs) determines if the succession of peaks 
and pits is random, or not. I think that the hypothesis here is little 
bit stronger: it should fit a Gaussian.

I just think a little bit to this problem, and I don't get a simple 
solution. Here is what I got, but this is subject certainly to many 
criticisms (feel free to do so!). The idea is to draw the cumulative 
distribution of the hits and fit it with a logistic curve. Then, 
predicted hits are back calculated (knowind that the logistic curve is 
symmetrical around 'xmid'), and the observed and predicted distributions 
of the hits are compared using a Kolmogorv-Smirnov goodness-of-fit test:

# Enter example data
id1 <- data.frame(
   dates = as.Date(c("2004-12-01", "2005-01-01", "2005-02-01",
           "2005-03-01", "2005-04-01", "2005-05-01", "2005-06-01",
           "2005-07-01", "2005-08-01", "2005-09-01", "2005-10-01",
           "2005-11-01", "2005-12-01")),
   hits  = c(3, 4, 10, 6, 35, 14, 33, 13, 3, 9, 8, 4, 3))
id2 <- data.frame(
   dates =  as.Date(c("2001-01-01", "2001-02-01", "2001-03-01",
            "2001-04-01", "2001-05-01", "2001-06-01", "2001-07-01",
            "2001-08-01", "2001-09-01", "2001-10-01", "2001-11-01",
            "2001-12-01", "2002-01-01", "2002-02-01", "2002-03-01",
            "2002-04-01", "2002-05-01", "2002-06-01", "2002-07-01",
            "2002-08-01", "2002-09-01", "2002-10-01", "2002-11-01",
            "2002-12-01", "2003-01-01", "2003-02-01", "2003-03-01")),
   hits  = c(6, 5, 5, 6, 2, 5, 1, 6, 4, 10, 0, 3, 6,
             5, 1, 2, 4, 4, 0, 1, 0, 2, 2, 2, 2, 3, 7))

# How does it look like?
plot(id1$dates, id1$hits, type = "l")
plot(id2$dates, id2$hits, type = "l")

# Cumsum of hits and fit models
id1$datenum <- as.numeric(id1$dates)
id1$cumhits <- cumsum(id1$hits)
id1.fit <- nls(cumhits ~ SSlogis(datenum, Asym, xmid, scal), data = id1)
summary(id1.fit)
plot(id1$dates, id1$cumhits)
lines(id1$dates, predict(id1.fit))

id2$datenum <- as.numeric(id2$dates)
id2$cumhits <- cumsum(id2$hits)
id2.fit <- nls(cumhits ~ SSlogis(datenum, Asym, xmid, scal), data = id2)
summary(id2.fit)
plot(id2$dates, id2$cumhits)
lines(id2$dates, predict(id2.fit))

# Get xmid and recalculate predicted values for hits
xmid1 <- coef(id1.fit)["xmid"]
id1$hitspred <- predict(id1.fit,
     newdata = data.frame(datenum = xmid1 - abs(id1$datenum - xmid1)))
plot(id1$dates, id1$hits, ylim = range(c(id1$hits, id1$hitspred)))
lines(id1$dates, id1$hitspred)

xmid2 <- coef(id2.fit)["xmid"]
id2$hitspred <- predict(id2.fit,
     newdata = data.frame(datenum = xmid2 - abs(id2$datenum - xmid2)))
plot(id2$dates, id2$hits, ylim = range(c(id2$hits, id2$hitspred)))
lines(id2$dates, id2$hitspred)

# A two samples Kolmogorov-Smirnov test of goodness-of-fit
ks.test(id1$hits, id1$hitspred)  # H0 not rejected
ks.test(id2$hits, id2$hitspred)  # H0 rejected

Best,

Philippe Grosjean

Kjetil Brinchmann Halvorsen wrote:
> Andreas Neumann wrote:
> 
>>Dear all,
>>
>>I have hundreds of thousands of univariate time series of the form:
>>character "seriesid", vector of Date, vector of integer
>>(some exemplary data is at the end of the mail)
>>
>>I am trying to find the ones which somehow "have a shape" over time that
>>looks like the histogramm of a (skewed) normal distribution:
>>
>>> hist(rnorm(200,10,2))
>>
>>The "mean" is not interesting, i.e. it does not matter if the first
>>nonzero observation happens in the 2. or the 40. month of observation.
>>So all that matters is: They should start sometime, the hits per month
>>increase, at some point they decrease and then they more or less
>>disappear.
>>
>>Short Example (hits at consecutive months (Dates omitted)):
>>1. series: 0 0 0 2 5 8 20 42 30 19 6 1 0 0 0                -> Good
>>2. series: 0 3 8 9 20 6 0 3 25 67 7 1 0 4 60 20 10 0 4      -> Bad
>>
>>Series 1 would be an ideal case of what I am looking for.
>>
>>Graphical inspection would be easy but is not an option due to the huge
>>amount of series.
>>
> 
> 
> Does function turnpoints)= in package pastecs help_
> 
> Kjetil
> 
> 
>>Questions:
>>
>>1. Which (if at all) of the many packages that handle time series is
>>appropriate for my problem?
>>
>>2. Which general approach seems to be the most straightforward and best
>>supported by R?
>>- Is there a way to test the time series directly (preferably)?
>>- Or do I need to "type-cast" them as some kind of histogram
>>  data and then test against the pdf of e.g. a normal distribution (but
>>  how)?
>>- Or something totally different?
>>
>>
>>Thank you for your time,
>>
>>     Andreas Neumann
>>
>>
>>
>>
>>Data Examples (id1 is good, id2 is bad):
>>
>>
>>>id1
>>
>>        dates       hits
>>1  2004-12-01         3
>>2  2005-01-01         4
>>3  2005-02-01        10
>>4  2005-03-01         6
>>5  2005-04-01        35
>>6  2005-05-01        14
>>7  2005-06-01        33
>>8  2005-07-01        13
>>9  2005-08-01         3
>>10 2005-09-01         9
>>11 2005-10-01         8
>>12 2005-11-01         4
>>13 2005-12-01         3
>>
>>
>>
>>>id2
>>
>>        dates       hits
>>1  2001-01-01         6
>>2  2001-02-01         5
>>3  2001-03-01         5
>>4  2001-04-01         6
>>5  2001-05-01         2
>>6  2001-06-01         5
>>7  2001-07-01         1
>>8  2001-08-01         6
>>9  2001-09-01         4
>>10 2001-10-01        10
>>11 2001-11-01         0
>>12 2001-12-01         3
>>13 2002-01-01         6
>>14 2002-02-01         5
>>15 2002-03-01         1
>>16 2002-04-01         2
>>17 2002-05-01         4
>>18 2002-06-01         4
>>19 2002-07-01         0
>>20 2002-08-01         1
>>21 2002-09-01         0
>>22 2002-10-01         2
>>23 2002-11-01         2
>>24 2002-12-01         2
>>25 2003-01-01         2
>>26 2003-02-01         3
>>27 2003-03-01         7
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
> 
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
>