[BioC] Wilcoxon test [was loged data or not loged previous to
use normalize.quantile]
Claus Mayer
claus at bioss.ac.uk
Wed Apr 6 19:07:49 CEST 2005
Two comments on this discussion:
Gordon Smyth wrote:
>
>
> Over many years as a statistician, I've heard it said so many times
> "the variances were not equal so I used a Wilcoxon two-sample test
> instead of a t-test" or "I used a rank test which is assumption free".
> Like Naomi, I find it frustating that this misunderstanding is so
> common. The fact is that all tests make some assumptions, and
> inequality of population variances under the null hypothesis breaks
> the Wilcoxon test just as it does the pooled t-test. I don't know
> which test breaks down more quickly -- I certainly haven't seen any
> evidence that the Wilcoxon test is more robust than the t-test to
> inequality of variances.
I guess the explanation for this misunderstanding is that there are (at
least) two different kinds of variance heterogeneity. The first one is
that caused by relationship between mean and variance, which can be
often removed by an appropriate transformation (e.g. log-transformation
or VSN in the case of microarray intensities). In this case the Wilcoxon
test or any other rank test really is robust against this variance
heterogeneity, because it is invariant under any transformation.
The second type of variance heterogeneity is the one Gordon talks about,
i.e. differences in variances that can also occur under the
nullhypothesis of equal means. This problem cannot be resolved by a
transformation and thus there is no obvious reason why rank tests should
be more robust/better in this case.
>>
>> With respect to permutations tests...
>>
>> I'm under the impression that you only need independence, not the
>> assumption of
>> constant variance.
>
>
> No, independence is not enough, as you say yourself in the next sentence.
Pollard and van der Laan have investigated resampling tests under
variance inequality. Although permutation methods do not give exact
control of type 1 errors for unequal variances (or non independently
identically distributed data in general), they control this error
asymptotically for balanced sample sizes. A simulation study in their
paper confirmed that, so as long as the sample sizes are balanced you
might not be that bad off with using a permutation method. Otherwise
bootstrapping (within group based on the residuals) should be preferred
Regards,
Claus
--
***********************************************************************************
Claus-D. Mayer | http://www.bioss.ac.uk
Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk
Rowett Research Institute | Telephone: +44 (0) 1224 716652
Aberdeen AB21 9SB, Scotland, UK. | Fax: +44 (0) 1224 715349
More information about the Bioconductor
mailing list