[R] Test statistic for Spearman correlation
Martin Maechler
maechler at stat.math.ethz.ch
Thu May 1 21:54:58 CEST 2003
>>>>> "PD" == Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk>
>>>>> on 01 May 2003 19:20:04 +0200 writes:
PD> Thomas W Blackwell <tblackw at umich.edu> writes:
>> Brett -
>>
>> I can give you a further reference, but you may not find
>> it much help !
>>
>> E. G. Olds. Distribution of sums of squares of rank
>> differences for small numbers of individuals. Annals of
>> Mathematical Statistics, v.9, pp. 133-148, 1938.
>>
>> My source says that "Olds (1938) tabulated the exact
>> distribution of a quantity S related to rho by the
>> equation
>>
>> R = 1 - 6 * S / (n^3 - n) ."
>>
>> Olds must have been using a Comptometer or a Marchant
>> calculator, so presumably, this construct guarantees
>> always to be an integer. Algorithm AS 89 is certainly
>> available on line from Statlib.
PD> The title of Olds paper might have given you a hint:
>> x <- rank(rnorm(10)) y <- rank(rnorm(10)) cor(x,y)
PD> [1] -0.2242424
>> 990/6*(1-cor(x,y))
PD> [1] 202
>> sum((x-y)^2)
PD> [1] 202
PD> BTW, the identity breaks down when there are ties,
PD> something that we probably ought to look into at some
PD> point. The code does say that the p values may be
PD> incorrect, but I suspect they may be more incorrect than
PD> need be.
Yes, I'm quite sure of this (both/all statements).
Note that I still have uncommitted fixes to the problem large n.
For the "proper" fix, I did get interested, and during the last weeks have spent quite
some time reading several of the original papers on these.
I also found that there now are much better (i.e. faster)
methods available for exact calculation of P-values.
Currently I plan for 1.7.1 to have an improvement here, and for
1.8.0 to have more.
Martin
More information about the R-help
mailing list