[R] multiple comparisons of time series data

Sun May 28 22:45:23 CEST 2006

PAIRWISE KOLMOGOROV-SMIRNOV:

	  I don't know, but it looks like you could just type "pairwise.t.test" 
at a command prompt, copy the code into an R script file, and create a 
function "pairwise.ks.test" just by changing the call to "t.test" with 
one to "ks.test".  Try it.  If you have trouble making it work, submit a 
post on that.

	  I would NOT do this, however, because the "ks.test" assumes samples 
of INDEPENDENT observations.  If you've got time series, I would expect 
the assumption of independence to be violated, and I would not believe 
the results of a KS test.  If you what to try what I just suggested, 
please also try it with multiple time series WITHOUT "varying our 
representation of the stream within the model", preferably several times.

COMPARING MULTIPLE TIME SERIES

	  If I had k different time series to compare, I might proceed as 
follows:

	  1.  Make normal probability plots using, e.g., qqnorm.  If the 
observations did NOT look normal, I'd consider some transformation.  If 
the numbers were all positive, I might consider using the "boxcox" 
function in library(MASS) to help select one.  However, I wouldn't 
completely believe the results, because this also assumes the 
observations are independent, and I know they're not.

	  2.  Try to fit some traditional time series model as describe, e.g., 
in the chapter on time series on Venables and Ripley (2002) Modern 
Applied Statistics with S (Springer).  There are better books on time 
series, but this is probably the first book I would recommend to anyone 
using R, and this chapter would be a reasonable start.  I'd play with 
this until I seemed to get sensible fits for nearly all series with the 
same model and with residuals that looked fairly though not totally (a) 
white by the Box-Ljung criteria, and (b) normal in normal probability 
plots.  If I saw consistent non-normal behavior in the residuals, it 
would indicate a problem bigger than I can handle in a brief email like 
this.

	  3.  With k different time series, most of the results of "2" could be 
summarized in k sets of estimated regression coefficients, all for the 
same model, with estimated standard errors plus whitened residuals.  If 
you had m parameters, each pair of time series could then be summarized 
into m z-scores = (b.i-b.j)/(var.b.i+var.b.j), which could then be 
further converted into m p.values.  You would then add the p.values from 
ks.test, making (m+1) p.values for each of the k*(k-1)/2 = 10 pairs of 
series with k = 5 series.  I'd then feed these k*(m+1) p.values into 
"p.adjust" to get an answer.  (Note:  "pairwise.t.test" calls 
"pairwise.table", which further calls "p.adjust".  I didn't know any of 
this before I read your post.)  I might experiment with the different 
"methods" for p.adjust, and I got different answers from the different 
methods, I might worry about which to believe.  The Bonferroni is the 
simplest, most widely known and understood, but also perhaps the most 
conservative.  I might tend to believe some of the others more, but if I 
got different answers, I'd suspect that the case was marginal, and I 
might want to generate other sets of simulations and try those.

	  4.  There are other facilities in R for multiple comparisons, e.g., 
in the multcomp and pgirmess packages.  Before I actually undertook 
steps 1, 2, and 3, above, I might review these packages to familiarize 
myself more with their contents.

	  5.  Virginia Tech has an excellent Statistics department with a 
consulting center.  You might try them.

	  hope this helps,
	  Spencer Graves

Kyle Hall wrote:
> I am interested in a statistical comparison of multiple (5) time series' 
> generated from modeling software (Hydrologic Simulation Program Fortran). The 
> model output simulates daily bacteria concentration in a stream. The multiple 
> time series' are a result of varying our representation of the stream within 
> the model.
> 
> Our main question is: Do the different methods used to represent a stream 
> produce different results at a statistically significant level?
> 
> We want to compare each otput time series to determine if there is a 
> difference before looking into the cause within the model.  In a previous 
> study, the Kolmogorov-Smirnov k-sample test was used to compare multiple time 
> series'.
> 
> I am unsure about the strength of the Kolmogorov-Smirnov test and I have set 
> out to determine if there are any other tests to compare multiple time 
> series'.
> 
> I know htat R has the ks.test but I am unsure how this test handles multiple 
> comparisons.  Is there something similar to a pairwise.t.test with a 
> bonferroni corection, only with time series data?
> 
> Does R currently (v 2.3.0) have a comparison test that takes into account the 
> strong serial correlation of time series data?
> 
> 
> Kyle Hall
> 
> Graduate Research Assistant
> Biological Systems Engineering
> Virginia Tech
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html