[R] multiple comparisons of time series data
Spencer Graves
spencer.graves at pdf.com
Sun May 28 22:45:23 CEST 2006
PAIRWISE KOLMOGOROV-SMIRNOV:
I don't know, but it looks like you could just type "pairwise.t.test"
at a command prompt, copy the code into an R script file, and create a
function "pairwise.ks.test" just by changing the call to "t.test" with
one to "ks.test". Try it. If you have trouble making it work, submit a
post on that.
I would NOT do this, however, because the "ks.test" assumes samples
of INDEPENDENT observations. If you've got time series, I would expect
the assumption of independence to be violated, and I would not believe
the results of a KS test. If you what to try what I just suggested,
please also try it with multiple time series WITHOUT "varying our
representation of the stream within the model", preferably several times.
COMPARING MULTIPLE TIME SERIES
If I had k different time series to compare, I might proceed as
follows:
1. Make normal probability plots using, e.g., qqnorm. If the
observations did NOT look normal, I'd consider some transformation. If
the numbers were all positive, I might consider using the "boxcox"
function in library(MASS) to help select one. However, I wouldn't
completely believe the results, because this also assumes the
observations are independent, and I know they're not.
2. Try to fit some traditional time series model as describe, e.g.,
in the chapter on time series on Venables and Ripley (2002) Modern
Applied Statistics with S (Springer). There are better books on time
series, but this is probably the first book I would recommend to anyone
using R, and this chapter would be a reasonable start. I'd play with
this until I seemed to get sensible fits for nearly all series with the
same model and with residuals that looked fairly though not totally (a)
white by the Box-Ljung criteria, and (b) normal in normal probability
plots. If I saw consistent non-normal behavior in the residuals, it
would indicate a problem bigger than I can handle in a brief email like
this.
3. With k different time series, most of the results of "2" could be
summarized in k sets of estimated regression coefficients, all for the
same model, with estimated standard errors plus whitened residuals. If
you had m parameters, each pair of time series could then be summarized
into m z-scores = (b.i-b.j)/(var.b.i+var.b.j), which could then be
further converted into m p.values. You would then add the p.values from
ks.test, making (m+1) p.values for each of the k*(k-1)/2 = 10 pairs of
series with k = 5 series. I'd then feed these k*(m+1) p.values into
"p.adjust" to get an answer. (Note: "pairwise.t.test" calls
"pairwise.table", which further calls "p.adjust". I didn't know any of
this before I read your post.) I might experiment with the different
"methods" for p.adjust, and I got different answers from the different
methods, I might worry about which to believe. The Bonferroni is the
simplest, most widely known and understood, but also perhaps the most
conservative. I might tend to believe some of the others more, but if I
got different answers, I'd suspect that the case was marginal, and I
might want to generate other sets of simulations and try those.
4. There are other facilities in R for multiple comparisons, e.g.,
in the multcomp and pgirmess packages. Before I actually undertook
steps 1, 2, and 3, above, I might review these packages to familiarize
myself more with their contents.
5. Virginia Tech has an excellent Statistics department with a
consulting center. You might try them.
hope this helps,
Spencer Graves
Kyle Hall wrote:
> I am interested in a statistical comparison of multiple (5) time series'
> generated from modeling software (Hydrologic Simulation Program Fortran). The
> model output simulates daily bacteria concentration in a stream. The multiple
> time series' are a result of varying our representation of the stream within
> the model.
>
> Our main question is: Do the different methods used to represent a stream
> produce different results at a statistically significant level?
>
> We want to compare each otput time series to determine if there is a
> difference before looking into the cause within the model. In a previous
> study, the Kolmogorov-Smirnov k-sample test was used to compare multiple time
> series'.
>
> I am unsure about the strength of the Kolmogorov-Smirnov test and I have set
> out to determine if there are any other tests to compare multiple time
> series'.
>
> I know htat R has the ks.test but I am unsure how this test handles multiple
> comparisons. Is there something similar to a pairwise.t.test with a
> bonferroni corection, only with time series data?
>
> Does R currently (v 2.3.0) have a comparison test that takes into account the
> strong serial correlation of time series data?
>
>
> Kyle Hall
>
> Graduate Research Assistant
> Biological Systems Engineering
> Virginia Tech
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list