[R] ANOVA Permutation Test

Mon Sep 3 18:06:51 CEST 2018

Juan, 

Your question might be borderline for this list, as it ultimately rather seems a stats question coming in R disguise.

Anyway, the short answer is that you *expect* to get a different p value from a permutation test unless you are able to do all possible permutation and therefore use the so-called systematic reference set. That is rarely the case, and only for relatively small problems. 
The permutation test uses a random subset of all possible permutations. Given this randomness, you'll get a different p value. In order to get reproducible results, you may specify a seed (?set.seed), yet that is only reproducible with this environment. Someone else with a different software and/or code might come out with a different p. The higher the number of permutations used, the smaller the variation around the p values, however. For most applications, 1000 seem good enough to me, but sometimes I go higher (in particular if the p value is borderline and I really need a strict above/below alpha decision).

The permutations do not create an implicit normal distribution, but rather a null distribution that can (likely is depending on non-normality of your data) not normal. So your respective proposal does not appeal.

I don't think you need an alternative - the permutation test is just fine, and recognizing the randomness in the execution does not render the (relatively small) variability in p values a major issue.

You may want to have a look at the text book by Edgington & Onghena for details on permutation tests, and there are plenty of papers out there addressing them in various contexts, which will help to understand *why* you observe what you observe here. 

HTH, Michael

> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Juan Telleria Ruiz
> de Aguirre
> Sent: Montag, 3. September 2018 17:18
> To: R help Mailing list <r-help using r-project.org>
> Subject: [R] ANOVA Permutation Test
> 
> Dear R users,
> 
> I have the following Question related to Package lmPerm:
> 
> This package uses a modified version of aov() function, which uses
> Permutation Tests instead of Normal Theory Tests for fitting an Analysis of
> Variance (ANOVA) Model.
> 
> However, when I run the following code for a simple linear model:
> 
> library(lmPerm)
> 
> e$t_Downtime_per_Intervention_Successful %>%
>   aovp(
>     formula = `Downtime per Intervention[h]` ~ `Working Hours`,
>     data = .
>   ) %>%
>   summary()
> 
> I obtain different p-values for each run!
> 
> With a regular ANOVA Test, I obtain instead a constant F-statistic, but I do not
> fulfill the required Normality Assumptions.
> 
> So my questions are:
> 
> Would it still be possible use the regular aov() by generating permutations in
> advance (Obtaining therefore a Normal Distribution thanks to the Central
> Limit Theorem)? And applying the aov() function afterwards? Does it have
> sense?
> 
> 
> Or maybe this issue could be due to unbalanced classes? I also tried to weight
> observations based on proportions, but the function failed.
> 
> 
> Any alternative solution for performing a One-Way ANOVA Test over Non-
> Normal Data?
> 
> 
> Thank you.
> 
> Juan
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.