[BioC] Wilcoxon test [was loged data or not loged previous to use normalize.quantile]

Wed Apr 6 23:58:25 CEST 2005

On Thu, April 7, 2005 3:07 am, Claus Mayer said:
> Two comments on this discussion:
>
> Gordon Smyth wrote:
>
>> Over many years as a statistician, I've heard it said so many times
>> "the variances were not equal so I used a Wilcoxon two-sample test
>> instead of a t-test" or "I used a rank test which is assumption free".
>> Like Naomi, I find it frustating that this misunderstanding is so
>> common. The fact is that all tests make some assumptions, and
>> inequality of population variances under the null hypothesis breaks
>> the Wilcoxon test just as it does the pooled t-test. I don't know
>> which test breaks down more quickly -- I certainly haven't seen any
>> evidence that the Wilcoxon test is more robust than the t-test to
>> inequality of variances.
>
> I guess the explanation for this misunderstanding is that there are (at
> least) two different kinds of variance heterogeneity. The first one is
> that caused by relationship between mean and variance, which can be
> often removed by an appropriate transformation (e.g. log-transformation
> or VSN in the case of microarray intensities). In this case the Wilcoxon
> test or any other rank test really is robust against this variance
> heterogeneity, because it is invariant under any transformation.

Agreed.

> The second type of variance heterogeneity is the one Gordon talks about,
> i.e. differences in variances that can also occur under the
> nullhypothesis of equal means. This problem cannot be resolved by a
> transformation and thus there is no obvious reason why rank tests should
> be more robust/better in this case.
>
>>> With respect to permutations tests...
>>>
>>> I'm under the impression that you only need independence, not the
>>> assumption of
>>> constant variance.
>>
>>
>> No, independence is not enough, as you say yourself in the next sentence.
>
> Pollard and van der Laan have investigated resampling tests under
> variance inequality. Although permutation methods do not give exact
> control of type 1 errors for unequal variances (or non independently
> identically distributed data in general), they control this error
> asymptotically for balanced sample sizes. A simulation study in their
> paper confirmed that, so as long as the sample sizes are balanced you
> might not be that bad off with using a permutation method. Otherwise
> bootstrapping (within group based on the residuals) should be preferred

Not forgetting that the two-sample t-test performs fine under the same circumstances (large
balanced samples), even for non-normal distributions and unequal variances.

Regards
Gordon

> Regards,
>
> Claus
>
> --
> ***********************************************************************************
>  Claus-D. Mayer                       | http://www.bioss.ac.uk
>  Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk
>  Rowett Research Institute            | Telephone: +44 (0) 1224 716652
>  Aberdeen AB21 9SB, Scotland, UK.     | Fax: +44 (0) 1224 715349
> ***********************************************************************************