[R] Code find exact distribution for runs test?
Greg Snow
Greg.Snow at imail.org
Fri Feb 12 19:33:53 CET 2010
Here is one quick way using the combinat package:
> library(combinat)
>
> tmpfun <- function(x) {
+ tmp <- rep(1,5)
+ tmp[x] <- -1
+ tmp
+ }
>
> combn(5,2, tmpfun)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] -1 -1 -1 -1 1 1 1 1 1 1
[2,] -1 1 1 1 -1 -1 -1 1 1 1
[3,] 1 -1 1 1 -1 1 1 -1 -1 1
[4,] 1 1 -1 1 1 -1 1 -1 1 -1
[5,] 1 1 1 -1 1 1 -1 1 -1 -1
>
Of course in this case the tmpfun function needs to be rewritten for each vector size, so is not generalizable.
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: Dale Steele [mailto:dale.w.steele at gmail.com]
> Sent: Friday, February 12, 2010 6:34 AM
> To: Greg Snow
> Cc: R-help at r-project.org
> Subject: Re: [R] Code find exact distribution for runs test?
>
> Wow! Your reply amply demonstrates the power of understanding the
> math (or finding someone who does) before turning on the computer.
>
> One last R question...how could I efficiently enumerate all
> distinguishable permutations of a vector of signs?
>
> vect <- c( -1, -1, 1, 1, 1)
>
> permn(vect) #length: factorial(length(vect))
> ???? #length: choose(n, n1)
>
> Best Regards. --Dale
>
>
> On Thu, Feb 11, 2010 at 10:04 PM, Greg Snow <Greg.Snow at imail.org>
> wrote:
> > Try this:
> >
> > druns2 <- function(x, n1, n2) {
> >
> > if( x %% 2 ) { # odd x
> > choose( n1 - 1, n1 - (x+1)/2 ) * choose( n2 - 1, n2 -
> (x-1)/2 ) / choose( n1+n2, n1 ) +
> > choose( n2 - 1, n2 - (x+1)/2 ) * choose( n1 - 1, n1 -
> (x-1)/2 ) / choose( n1+n2, n2 )
> > } else { # even x
> > choose( n1 - 1, n1 - x/2 ) * choose( n2 - 1, n2 - x/2
> ) / choose( n1 + n2, n1 ) +
> > choose( n2 - 1, n2 - x/2 ) * choose( n1 - 1, n1 - x/2
> ) / choose( n1 + n2, n2 )
> > }
> > }
> >
> > exp.2 <- sapply( 2:7, druns2, n1=3, n2=4 )
> > exp.2 - exp.r/factorial(7)
> >
> >
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.snow at imail.org
> > 801.408.8111
> >
> >
> >> -----Original Message-----
> >> From: Dale Steele [mailto:dale.w.steele at gmail.com]
> >> Sent: Thursday, February 11, 2010 5:28 PM
> >> To: Greg Snow
> >> Cc: R-help at r-project.org
> >> Subject: Re: [R] Code find exact distribution for runs test?
> >>
> >> Thanks for this! My original query was probably unclear. I think
> you
> >> have answered a different, possibly more interesting question. My
> >> goal was to find an exact distribution of a total number of runs R
> in
> >> samples of two types of size (n1, n2) under the null hypothesis of
> >> randomness.
> >>
> >> The horribly inefficient code below generates results which match
> >> Table 10 in Wackerly, Mendehall and Scheaffer for the distribution
> of
> >> the total number of runs R in samples of size (n1, n2); P(R <= r),
> >> under the null hypothesis of randomness. Horribly inefficient
> because
> >> couldn't figure out how to generate only the distinguishable
> >> permutations of my sample vector. Hope this brute force approach
> >> illustrates what I am after.
> >>
> >>
> >> nruns <- function(x) {
> >> signs <- sign(x)
> >> runs <- rle(signs)
> >> r <- length(runs$lengths)
> >> return(r)
> >> }
> >>
> >> library(combinat)
> >> # for example (n1=3, n2=4)
> >> n1 <- 3; n2 <- 4; n <- n1+n2
> >> vect <- rep( c(-1,1), c(3,4))
> >> vect
> >>
> >> # ! 'nruns(vect)' generates factorial(7) permutations, only
> >> choose(7,3) are distinguishable
> >>
> >> exp.r <- table(unlist(permn(vect, nruns )))
> >> cumsum(dist/factorial(7))
> >>
> >> # Result agrees with Table 10, pg 814 (n1=3, n2=4)
> >> #in Wackerly, Mendenhall and Scheaffer, 2002, presumably done using
> >> # exact calculations.
> >>
> >> Thanks. --Dale
> >>
> >> On Thu, Feb 11, 2010 at 6:08 PM, Greg Snow <Greg.Snow at imail.org>
> wrote:
> >> > OK, I think I have it worked out for both cases:
> >> >
> >> > library(vcd)
> >> >
> >> > druns <- function(x, n) { # x runs in n data points, not
> vectorized
> >> > # based on sample
> >> median
> >> >
> >> > if( n%%2 ) stop('n must be even')
> >> >
> >> > if( x %% 2 ) { # odd x
> >> > choose( n/2 - 1, n/2-(x+1)/2 ) * choose( n/2 - 1,
> n/2-
> >> (x-1)/2 )/
> >> > choose(n,n/2) * 2
> >> > } else { # even x
> >> > choose( n/2 - 1, n/2-x/2 ) * choose( n/2 - 1, n/2-
> x/2
> >> )/
> >> > choose(n,n/2) * 2
> >> > }
> >> > }
> >> >
> >> > ## population median
> >> > out1 <- replicate( 100000, {x <- rnorm(10);
> >> length(rle(sign(x))$lengths) } )
> >> >
> >> > rootogram( table(out1), dbinom(0:9, size=9, prob=0.5)*100000 )
> >> > chisq.test( table(out1), p=dbinom(0:9, size=9, prob=0.5) )
> >> >
> >> >
> >> > ## sample median
> >> > out2 <- replicate( 100000, {x <- rnorm(10); x <- x - median(x);
> >> length(rle(sign(x))$lengths) } )
> >> >
> >> > fit <- sapply( 2:10, druns, n=10 )
> >> >
> >> > rootogram( table(out2), fit * 100000 )
> >> > chisq.test( table(out2), p=fit )
> >> >
> >> >
> >> > The tests only fail to prove me wrong, not a firm proof that I am
> >> correct. But given the simulation size I am at least pretty close.
> I
> >> can see why people worked out approximations before we had good
> >> computers, I would not want to calculate the above by hand.
> >> >
> >> > Enjoy,
> >> >
> >> > --
> >> > Gregory (Greg) L. Snow Ph.D.
> >> > Statistical Data Center
> >> > Intermountain Healthcare
> >> > greg.snow at imail.org
> >> > 801.408.8111
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> >> >> project.org] On Behalf Of Greg Snow
> >> >> Sent: Thursday, February 11, 2010 12:19 PM
> >> >> To: Dale Steele; R-help at r-project.org
> >> >> Subject: Re: [R] Code find exact distribution for runs test?
> >> >>
> >> >> I am not an expert in this area, but here are some thoughts that
> may
> >> >> get you started towards an answer.
> >> >>
> >> >> First, there are 2 ways (possibly more) that can lead to the data
> >> for a
> >> >> runs test that lead to different theoretical distributions:
> >> >>
> >> >> 1. We have a true or hypothesized value of the median that we
> >> >> subtracted from the data, therefore each value has 50%
> probability
> >> of
> >> >> being positive/negative and all are independent of each other
> >> (assuming
> >> >> being exactly equal to the median is impossible or discarded).
> >> >>
> >> >> 2. We have subtracted the sample median from each sample value
> (and
> >> >> discarded any 0's) leaving us with exactly half positive and half
> >> >> negative and not having independence.
> >> >>
> >> >> In case 1, the 1st observation will always start a run. The
> second
> >> >> observation has a 50% chance of being the same sign (F) or
> different
> >> >> sign (S), with the probability being 0.5 for each new observation
> >> and
> >> >> them all being independent (under assumption of random selections
> >> from
> >> >> population with known/hypothesized median) and the number of runs
> >> >> equaling the number of S's, this looks like a binomial to me
> (with
> >> some
> >> >> '-1's inserted in appropriate places.
> >> >>
> >> >> In case 2, this looks like a hypergeometric distribution, there
> >> would
> >> >> be n!/((n/2)!(n/2)!) possible permutations, just need to compute
> how
> >> >> many of those permutations result in x runs to get the
> probability.
> >> >> There is probably a way to think about this in terms of balls and
> >> urns,
> >> >> but I have not worked that out yet.
> >> >>
> >> >> Hope this helps,
> >> >>
> >> >> --
> >> >> Gregory (Greg) L. Snow Ph.D.
> >> >> Statistical Data Center
> >> >> Intermountain Healthcare
> >> >> greg.snow at imail.org
> >> >> 801.408.8111
> >> >>
> >> >>
> >> >> > -----Original Message-----
> >> >> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> >> >> > project.org] On Behalf Of Dale Steele
> >> >> > Sent: Wednesday, February 10, 2010 6:16 PM
> >> >> > To: R-help at r-project.org
> >> >> > Subject: [R] Code find exact distribution for runs test?
> >> >> >
> >> >> > I've been attempting to understand the one-sample run test for
> >> >> > randomness. I've found run.test{tseries} and
> run.test{lawstat}.
> >> >> Both
> >> >> > use a large sample approximation for distribution of the total
> >> number
> >> >> > of runs in a sample of n1 observations of one type and n2
> >> >> observations
> >> >> > of another type.
> >> >> >
> >> >> > I've been unable to find R code to generate the exact
> distribution
> >> >> and
> >> >> > would like to see how this could be done (not homework).
> >> >> >
> >> >> > For example, given the data:
> >> >> >
> >> >> > dtemp <- c(12, 13, 12, 11, 5, 2, -1, 2, -1, 3, 2, -6, -7, -7, -
> 12,
> >> -
> >> >> 9,
> >> >> > 6, 7, 10, 6, 1, 1, 3, 7, -2, -6, -6, -5, -2, -1)
> >> >> >
> >> >> > The Monte Carlo permutation approach seems to get me part way.
> >> >> >
> >> >> >
> >> >> > # calculate the number of runs in the data vector
> >> >> > nruns <- function(x) {
> >> >> > signs <- sign(x)
> >> >> > runs <- rle(signs)
> >> >> > r <- length(runs$lengths)
> >> >> > return(r)
> >> >> > }
> >> >> >
> >> >> > MC.runs <- function(x, nperm) {
> >> >> > RUNS <- numeric(nperm)
> >> >> > for (i in 1:nperm) {
> >> >> > RUNS[i] <- nruns(sample(x))
> >> >> > }
> >> >> > cdf <- cumsum(table(RUNS))/nperm
> >> >> > return(list(RUNS=RUNS, cdf=cdf, nperm=nperm))
> >> >> > }
> >> >> >
> >> >> > MC.runs(dtemp, 100000)
> >> >> >
> >> >> > Thanks. --Dale
> >> >> >
> >> >> > ______________________________________________
> >> >> > R-help at r-project.org mailing list
> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> > PLEASE do read the posting guide http://www.R-
> project.org/posting-
> >> >> > guide.html
> >> >> > and provide commented, minimal, self-contained, reproducible
> code.
> >> >>
> >> >> ______________________________________________
> >> >> R-help at r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide http://www.R-
> project.org/posting-
> >> >> guide.html
> >> >> and provide commented, minimal, self-contained, reproducible
> code.
> >> >
> >
More information about the R-help
mailing list