[R] Code find exact distribution for runs test?

Thu Feb 11 20:19:01 CET 2010

I am not an expert in this area, but here are some thoughts that may get you started towards an answer.

First, there are 2 ways (possibly more) that can lead to the data for a runs test that lead to different theoretical distributions:

1. We have a true or hypothesized value of the median that we subtracted from the data, therefore each value has 50% probability of being positive/negative and all are independent of each other (assuming being exactly equal to the median is impossible or discarded).

2. We have subtracted the sample median from each sample value (and discarded any 0's) leaving us with exactly half positive and half negative and not having independence.

In case 1, the 1st observation will always start a run.  The second observation has a 50% chance of being the same sign (F) or different sign (S), with the probability being 0.5 for each new observation and them all being independent (under assumption of random selections from population with known/hypothesized median) and the number of runs equaling the number of S's, this looks like a binomial to me (with some '-1's inserted in appropriate places.

In case 2, this looks like a hypergeometric distribution, there would be n!/((n/2)!(n/2)!) possible permutations, just need to compute how many of those permutations result in x runs to get the probability.  There is probably a way to think about this in terms of balls and urns, but I have not worked that out yet.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Dale Steele
> Sent: Wednesday, February 10, 2010 6:16 PM
> To: R-help at r-project.org
> Subject: [R] Code find exact distribution for runs test?
> 
> I've been attempting to understand the one-sample run test for
> randomness.  I've found run.test{tseries} and run.test{lawstat}.  Both
> use a large sample approximation for distribution of the total number
> of runs in a sample of n1 observations of one type and n2 observations
> of another type.
> 
> I've been unable to find R code to generate the exact distribution and
> would like to see how this could be done (not homework).
> 
> For example, given the data:
> 
> dtemp <- c(12, 13, 12, 11, 5, 2, -1, 2, -1, 3, 2, -6, -7, -7, -12, -9,
> 6, 7, 10, 6, 1, 1, 3, 7, -2, -6, -6, -5, -2, -1)
> 
> The Monte Carlo permutation approach seems to get me part way.
> 
> 
> # calculate the number of runs in the data vector
> nruns <- function(x) {
>     signs <- sign(x)
>     runs <- rle(signs)
>     r <- length(runs$lengths)
>     return(r)
> }
> 
> MC.runs <- function(x, nperm) {
> RUNS <- numeric(nperm)
> for (i in  1:nperm) {
>     RUNS[i] <- nruns(sample(x))
> }
>     cdf <- cumsum(table(RUNS))/nperm
>     return(list(RUNS=RUNS, cdf=cdf, nperm=nperm))
> }
> 
> MC.runs(dtemp, 100000)
> 
> Thanks.  --Dale
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.