# [R] Randomization tests, grouped data

Charles C. Berry cberry at tajo.ucsd.edu
Fri Jan 11 23:50:34 CET 2008

```On Fri, 11 Jan 2008, Johannes HÃ¼sing wrote:

> Tom Backer Johnsen <backer at psych.uib.no> [Fri, Jan 11, 2008 at 06:57:41PM CET]:
> [...]
>>> Are there something that can handle this in R?
>>
>
> Have you considered the coin package?
>
>> After a few hours thinking on and off about the problem, I suspect
>> that the question may be stupid or silly (or both).  If that is the
>> case, I would very much like to know why.
>>
>
> I am not quite clear in my thinking anymore, but there are 2^2n
> permutations, of which (2n choose n) happen to yield the same
> effect. These cases are "part of life" and should be counted in
> the permutation test just as well. You might save a little bit of
> computation time by singling these group-preserving permutations
> out, but this is not worth the while at all.
>

It depends (as always...)

Suppose you have two samples with n1 and n2 independent observations in
each. You wish to do a two sample test on each of M variables and M is
quite large. And you wish to account for multiplicity in testing. So, a
permutation test is constructed.

If n1 == n2 == 4, there are choose(8,4) == 70 arrangements. By enumerating
them all you can get the p-value of your test statistic, and often this is
practical.

But if you sample (say) 70 from the factorial(8) arrangements, you will
likely miss some and repeat others. The number 0.632 comes to mind as the
fraction of distinct arrangements that will actually show up (see Efron
and Tibs Intro to the Bootstrap to check if this is right).

To get an accurate p-value via sampling from the factorial(8), you would
need a much larger sample than the number of distinct arrangements.

OTOH, if the number of distinct arrangements is too large to be able to
enumerate them all and is much larger than the number you could afford to
enumerate, then sampling from factorial(n1+n2) and sampling from
choose(n1+n2,n2) are nearly equivalent. You could use the finite
population correction to ascertain just how different they are, I think.

HTH,

Chuck

> --
> Johannes HÃ¼sing               There is something fascinating about science.
>                              One gets such wholesale returns of conjecture
> mailto:johannes at huesing.name  from such a trifling investment of fact.
> http://derwisch.wikidot.com         (Mark Twain, "Life on the Mississippi")
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help