[Rd] Printing the null hypothesis
Liviu Andronic
landronimirc at gmail.com
Sun Aug 16 18:30:15 CEST 2009
On 8/16/09, Ted Harding <Ted.Harding at manchester.ac.uk> wrote:
> > Oh, I had a slightly different H0 in mind. In the given example,
> > cor.test(..., met="kendall") would test "H0: x and y are independent",
> > but cor.test(..., met="pearson") would test: "H0: x and y are not
> > correlated (or `are linearly independent')" .
>
>
> Ah, now you are playing with fire! What the Pearson, Kendall and
> Spearman coefficients in cor.test measure is *association*. OK, if
> the results clearly indicate association, then the variables are
> not independent. But it is possible to have two variables x, y
> which are definitely not independent (indeed one is a function of
> the other) which yield zero association by any of these measures.
>
> Example:
> x <- (-10:10) ; y <- x^2 - mean(x^2)
> cor.test(x,y,method="pearson")
> # Pearson's product-moment correlation
> # t = 0, df = 19, p-value = 1
> # alternative hypothesis: true correlation is not equal to 0
> # sample estimates: cor 0
> cor.test(x,y,method="kendall")
>
> # Kendall's rank correlation tau
>
> # z = 0, p-value = 1
> # alternative hypothesis: true tau is not equal to 0
> # sample estimates: tau 0
> # cor.test(x,y,method="spearman")
> # Spearman's rank correlation rho
> # S = 1540, p-value = 1
> # alternative hypothesis: true rho is not equal to 0
> # sample estimates: rho 0
>
> If you wanted, for instance, that the "method=kendall" should
> announce that it is testing "H0: x and y are independent" then
> it would seriously mislead the reader!
>
I did take the null statement from the description of
Kendall::Kendall() ("Computes the Kendall rank correlation and its
p-value on a two-sided test of H0: x and y are independent."). Here,
perhaps "monotonically independent" (as opposed to "functionally
independent") would have been more appropriate.
Still, this very example seems to support my original idea: users can
easily get confused on what is the exact null of a test. Does it test
for "association" or for "no association", for "normality" or for
"lack of normality" . Printing a precise and appropriate statement of
the null would prove helpful in interpreting the results, and in
avoiding misinterpreting these.
> > Here both "H0: x is normal" and "Ha: x is not normal" are missing. At
> > least to beginners, these things are not always perfectly clear (even
> > after reading the documentation), and when interpreting the results it
> > can prove useful to have on-screen information about the null.
>
> This is possibly a more discussable point, in that even if you know
> what the Shapiro-Wilk statistic is, it is not obvious what it is
> sensitive to, and hence what it might be testing for. But I doubt
> that someone would be led to try the Shapiro-Wilk test in the
> first place unless they were aware that it was a test for normality,
> and indeded this is announced in the first line of the response.
> The alternative, therefore, is "non-normality".
>
To be particularly picky, as statistics is, this is not so obvious
from the print-out. For the Shapiro-Wilk test one could indeed deduce
that since it is a "test of normality", then the null tested is "H0:
data is normal". This would not hold for, say, the Pearson
correlation. In loose language, it would estimate and test for
"correlation"; in more statistically appropriate language, it will
test for "no correlation" (or for "no association"). It feels to me
that without appropriate indicators, one can easily get playing with
fire.
> As to the contrast between absence of an "Ha" statement for the
> Shapiro-Wilk, and its presence in cor,test(), this comes back to
> the point I made earlier: cot.test() offers you three alternatives
> to choose from: "two-sided" (default), "greater", "less". This
> distinction can be important, and when cor.test() reports "Ha" it
> tells you which one was used.
>
> On the other hand, as far as Shapiro-Wilk is concerned there is
> no choice of alternatives (nor of anything else except the data x).
> So there is nothing to tell you! And, further, departure from
> normality has so many "dimensions" that alternatives like "two
> sided", "greater" or "less" would make no sense. One can think of
> tests targeted at specific kinds of alternative such as "Distribution
> is excessively skew" or "distribution has excessive kurtosis" or
> "distribution is bimodal" or "distribution is multimodal", and so on.
> But any of these can be detected by Shapiro-Wilk, so it is not
> targeted at any specific alternative.
>
Thank you for these explanations. Best
Liviu
More information about the R-devel
mailing list