[R] about spearman and kendal correlation coefficient calculation in "cor"

Thomas Lumley tlumley at uw.edu
Mon May 16 23:46:26 CEST 2011


On Tue, May 17, 2011 at 9:03 AM, Baoqiang Cao <bqcaomail at gmail.com> wrote:
> Hi,
>
> I have the following two measurements stored in mat:
>
>> print(mat)
>           [,1]     [,2]
>  [1,] -14.80976 -265.786
>  [2,] -14.92417  -54.724
>  [3,] -13.92087  -58.912
>  [4,]  -9.11503 -115.580
>  [5,] -17.05970 -278.749
>  [6,] -25.23313 -219.513
>  [7,] -19.62465 -497.873
>  [8,] -13.92087 -659.486
>  [9,] -14.24629 -131.680
> [10,] -20.81758 -604.961
> [11,] -15.32194  -18.735
>
> To calculate the ranking correlation, I used "cor":
>> cor(mat[,1], mat[,2], method="spearman")
> [1] 0.2551259
>> cor(mat[,1], mat[,2], method="kendal")
> [1] 0.1834940
>
> However, when I tried to reproduce the two correlation coefficients by
> following their "defination", I got different results than from "cor".
> For Spearman, I got:
>
> od1 = order(mat[,1], decreasing=T)
> od2= order(mat[,2], decreasing=T)
>> 1-6*sum((od1-od2)^2)/(length(od1)^3-length(od1))
> [1] -0.2909091

You've confused order() and rank(), I think.

> This is different with from "cor", which is 0.255.
>
> For Kendal, I got:
>
> accord=0
> disaccord=0
>
> experi=mat[,1]
> target=mat[,2]
> N= length(experi)
> for(i in 1:(N-1)) {
>   for(j in (i+1):N) {
>      if((target[i] < target[j]) && (experi[i] < experi[j])) {
>         accord=accord+1
>      } else if ((target[i] > target[j]) && (experi[i] > experi[j])) {
>         disaccord=disaccord+1
>      }
>   }
> }

And this is just wrong.  Both if() clauses find concordant pairs
(Yi<Yj) & (Xi < Zj), and (Yi>Yj) & (Xi>Xj), but you're calling half of
them discordant and not counting the actual discordant pairs.

A quicker and correct computation would use outer().  In the absence
of ties, outer(x,x,"<")==outer(y,y,"<") gets you TRUE for concordant
and FALSE for discordant pairs, and sum() adds them up.  This counts
each pair twice, so you need to rescale.

      -thomas


>> (accord-disaccord)/(N*(N-1)/2)
> [1] -0.2181818
>
> This is also different with from "cor", which is 0.183.
>
> Anybody could help me out explaining the right answer? Thanks in
> advance!
>
> Baoqiang
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland



More information about the R-help mailing list