The alternative hypothesis in permutation testing

In this article, we discuss a key difference between the traditional framework for null hypothesis significance testing (NHST) and the permutation framework for NHST. This critical difference lies at the root of the framework in the specification of the null and alternative hypothesis. First, we review the traditional approach to NHST. Second we explain how the use of the permutation framework requires particular care when formulating the null and alternative hypotheses. Finally, we will discuss an approach proposed by Pesarin and Salmaso (2010), coined Non-Parametric Combination (NPC), which allows one to combine several test statistics into a single test.

Traditional NHST

The traditional approach to NHST pertains to specifying a null distribution \(H_0\) that we would like to reject in favor of an alternative hypothesis \(H_a\) given statistical evidence in the form of data samples. For example, if we study the effect of some drug on the amount of sugar in the blood in patients diagnosed with diabetes, we might take two samples out of two distinct populations, one to which we gave a placebo and one to which we gave the treatment. At this point, the goal is to show that the average amount of sugar in the treatment group is lower than the one in the placebo group. Hence, a suitable test for answering this question is given by the following hypotheses:

\[ H_0: \mu_\mathrm{treatment} \ge \mu_\mathrm{placebo} \quad \mbox{against} \quad H_a: \mu_\mathrm{treatment} < \mu_\mathrm{placebo} \]

As suggested by intuition, the alternative hypothesis is first determined on the basis of what we aim at proving and the null hypothesis \(H_0\) is then deduced as the complementary event to \(H_a\).

Permutation NHST

The permutation framework completely redefines the null and alternative hypotheses with respect to the traditional approach:

Non-Parametric Combination

Once you have your sample of \(m\) permutations out the \(m_t\) possible ones, you can in fact compute the values of as many test statistics \(T^{(1)}, \dots, T^{(L)}\) as you want. At this point, you might want to use the unbiased estimator \(\widehat{p_\infty}^{(\ell)} = \frac{B^{(\ell)}}{m}\) of the p-value \(p_\infty^{(\ell)} = \mathbb{P} \left( T^{(\ell)} \ge t_\mathrm{obs}^{(\ell)} \right)\) for each test statistic to produce \(L\) p-value estimates, each one targeting a different aspect of the distributions under investigation. Since this evidence has been summarized by p-values, they are all on the same scale even though they might look at very different features of the distributions. They can therefore be combined in various ways to provide a single test statistic value to be used in the testing procedure. There are several possible combining functions to do this fusion of p-values. The package flipr currently implements:

The choice of the combining function is made through the optional argument combining_function which takes a string as value. At the moment, it accepts either "tippett" or "fisher" for picking one of the two above-mentioned combining functions.

References

Pesarin, Fortunato, and Luigi Salmaso. 2010. “Permutation Tests for Complex Data,” March. https://doi.org/10.1002/9780470689516.