[Rd] O(N log N) Kendall Tau
Dirk Eddelbuettel
edd at debian.org
Sun Dec 13 17:27:03 CET 2009
On 13 December 2009 at 00:38, David Simcha wrote:
| I've noticed that the implementation of Kendall's Tau in R is O(N^2).
| The following reference describes how it can be done in O(N log N):
|
| A Computer Method for Calculating Kendall's Tau with Ungrouped Data
| William R. Knight
| Journal of the American Statistical Association, Vol. 61, No. 314, Part
| 1 (Jun., 1966), pp. 436-439
| http://www.jstor.org/pss/2282833
Interesting. I recently used the performance of cor(X, method="kendall") as
a contrast to the 26-fold speedup when (!!) using gpuCor(X, method="kendall")
from the nice gputools package by Josh Buckner and Mark Seligman. That was
using the GPU in a QuadroFX 4800 with a 1206 x 477 matrix; the full example
is in my most recent 'Intro to HPC with R' slides.
| I'm interested in contributing an implementation of this algorithm to
| R. However, I have a few questions:
|
| 1. Would it likely be accepted if well-implemented?
| 2. cov.c in the main/ directory of the R source tree seems to contain
| at least 4 different but nearly identical implementations of Kendall's
| Tau: cov_complete1, cov_complete2, cov_na1, and cov_na2. I can't tell
| why.
Combine your bottom-up code analysis with a top-down usage analysis -- open
up R and read 'help(cov)'. There are different ways to deal with missing
observations.
| The file is very difficult to read because of its lack of
| comments, (ab)use of macros and improper indentation. As I will
| probably need to modify this file, can some seasoned veteran please
| explain what the heck is going on in this file?
Again, I think reading the help file along with the code may prove
beneficial. The R implementation is pretty consistent across files to you
will have to get used to those C level macros if you want to modify code at
that level. The R Extensions manual may prove helpful.
Lastly, as a matter of style, I find people are more likely to help you if
you don't hit them first with a two-by-four of 'your code and style is horrid'.
Cheers, Dirk
--
Three out of two people have difficulties with fractions.
More information about the R-devel
mailing list