[R] about spearman and kendal correlation coefficient calculation in "cor"
Thomas Lumley
tlumley at uw.edu
Mon May 16 23:46:26 CEST 2011
On Tue, May 17, 2011 at 9:03 AM, Baoqiang Cao <bqcaomail at gmail.com> wrote:
> Hi,
>
> I have the following two measurements stored in mat:
>
>> print(mat)
> [,1] [,2]
> [1,] -14.80976 -265.786
> [2,] -14.92417 -54.724
> [3,] -13.92087 -58.912
> [4,] -9.11503 -115.580
> [5,] -17.05970 -278.749
> [6,] -25.23313 -219.513
> [7,] -19.62465 -497.873
> [8,] -13.92087 -659.486
> [9,] -14.24629 -131.680
> [10,] -20.81758 -604.961
> [11,] -15.32194 -18.735
>
> To calculate the ranking correlation, I used "cor":
>> cor(mat[,1], mat[,2], method="spearman")
> [1] 0.2551259
>> cor(mat[,1], mat[,2], method="kendal")
> [1] 0.1834940
>
> However, when I tried to reproduce the two correlation coefficients by
> following their "defination", I got different results than from "cor".
> For Spearman, I got:
>
> od1 = order(mat[,1], decreasing=T)
> od2= order(mat[,2], decreasing=T)
>> 1-6*sum((od1-od2)^2)/(length(od1)^3-length(od1))
> [1] -0.2909091
You've confused order() and rank(), I think.
> This is different with from "cor", which is 0.255.
>
> For Kendal, I got:
>
> accord=0
> disaccord=0
>
> experi=mat[,1]
> target=mat[,2]
> N= length(experi)
> for(i in 1:(N-1)) {
> for(j in (i+1):N) {
> if((target[i] < target[j]) && (experi[i] < experi[j])) {
> accord=accord+1
> } else if ((target[i] > target[j]) && (experi[i] > experi[j])) {
> disaccord=disaccord+1
> }
> }
> }
And this is just wrong. Both if() clauses find concordant pairs
(Yi<Yj) & (Xi < Zj), and (Yi>Yj) & (Xi>Xj), but you're calling half of
them discordant and not counting the actual discordant pairs.
A quicker and correct computation would use outer(). In the absence
of ties, outer(x,x,"<")==outer(y,y,"<") gets you TRUE for concordant
and FALSE for discordant pairs, and sum() adds them up. This counts
each pair twice, so you need to rescale.
-thomas
>> (accord-disaccord)/(N*(N-1)/2)
> [1] -0.2181818
>
> This is also different with from "cor", which is 0.183.
>
> Anybody could help me out explaining the right answer? Thanks in
> advance!
>
> Baoqiang
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Thomas Lumley
Professor of Biostatistics
University of Auckland
More information about the R-help
mailing list