[R] Classifying time series by shape over time
Andreas Neumann
Andreas.Neumann at em.uni-karlsruhe.de
Tue Mar 21 17:08:48 CET 2006
Dear all,
I have hundreds of thousands of univariate time series of the form:
character "seriesid", vector of Date, vector of integer
(some exemplary data is at the end of the mail)
I am trying to find the ones which somehow "have a shape" over time that
looks like the histogramm of a (skewed) normal distribution:
> hist(rnorm(200,10,2))
The "mean" is not interesting, i.e. it does not matter if the first
nonzero observation happens in the 2. or the 40. month of observation.
So all that matters is: They should start sometime, the hits per month
increase, at some point they decrease and then they more or less
disappear.
Short Example (hits at consecutive months (Dates omitted)):
1. series: 0 0 0 2 5 8 20 42 30 19 6 1 0 0 0 -> Good
2. series: 0 3 8 9 20 6 0 3 25 67 7 1 0 4 60 20 10 0 4 -> Bad
Series 1 would be an ideal case of what I am looking for.
Graphical inspection would be easy but is not an option due to the huge
amount of series.
Questions:
1. Which (if at all) of the many packages that handle time series is
appropriate for my problem?
2. Which general approach seems to be the most straightforward and best
supported by R?
- Is there a way to test the time series directly (preferably)?
- Or do I need to "type-cast" them as some kind of histogram
data and then test against the pdf of e.g. a normal distribution (but
how)?
- Or something totally different?
Thank you for your time,
Andreas Neumann
Data Examples (id1 is good, id2 is bad):
> id1
dates hits
1 2004-12-01 3
2 2005-01-01 4
3 2005-02-01 10
4 2005-03-01 6
5 2005-04-01 35
6 2005-05-01 14
7 2005-06-01 33
8 2005-07-01 13
9 2005-08-01 3
10 2005-09-01 9
11 2005-10-01 8
12 2005-11-01 4
13 2005-12-01 3
> id2
dates hits
1 2001-01-01 6
2 2001-02-01 5
3 2001-03-01 5
4 2001-04-01 6
5 2001-05-01 2
6 2001-06-01 5
7 2001-07-01 1
8 2001-08-01 6
9 2001-09-01 4
10 2001-10-01 10
11 2001-11-01 0
12 2001-12-01 3
13 2002-01-01 6
14 2002-02-01 5
15 2002-03-01 1
16 2002-04-01 2
17 2002-05-01 4
18 2002-06-01 4
19 2002-07-01 0
20 2002-08-01 1
21 2002-09-01 0
22 2002-10-01 2
23 2002-11-01 2
24 2002-12-01 2
25 2003-01-01 2
26 2003-02-01 3
27 2003-03-01 7
More information about the R-help
mailing list