[R] spearman rank correlation problem
Martin Maechler
maechler at stat.math.ethz.ch
Tue Mar 16 10:25:49 CET 2004
>>>>> "William" == William T Morgan <wmorgan at mitre.org>
>>>>> on 15 Mar 2004 16:37:08 -0500 writes:
William> Hello R gurus,
William> I want to calculate the Spearman rho between two ranked lists. I am
William> getting results with cor.test that differ in comparison to my own
William> spearman function:
>> my.spearman
William> function(l1, l2) {
William> if(length(l1) != length(l2)) stop("lists must have same length")
William> r1 <- rank(l1)
William> r2 <- rank(l2)
William> dsq <- sapply(r1-r2,function(x) x^2)
William> 1 - ((6 * sum(dsq)) / (length(l1) * (length(l1)^2 - 1)))
William> }
William> Perhaps I'm doing something wrong in that code, but it's a pretty
William> straightforward calculation, so it's hard to see what, especially with
William> rank() handling the ties correctly.
Well, the "ties" in your example are really the "problem".
The formula you use,
1 - 6 S(d^2) / (n^3 - n) ( d = r1 - r2 ; r{12} := rank(x{12}) )
is only equal to the more natural definition,
cor(r1, r2), in the situation when there are no ties
[plus in a few "lucky" situations with ties].
cor.test() and now cor(*, method = "spearman") in R have always used
the correlation of the ranks.
It seems that this needs to be documented, since you are right,
the "1 - 6 S / (..)" formula is also in use as *definition* for
Spearman's rank correlation.
Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><
More information about the R-help
mailing list