[Rd] Canberra distance and binary distance
cgenolin at u-paris10.fr
Sat Feb 6 17:49:14 CET 2010
I guess there is also a problem in the binary distance since
x <- y <- rep(0,10)
gives 0 whereas it suppose to be undefine. (the aka asymmetric binary is
not suppose to take in account the (off,off) couples in its calculation)
> The definition I use is the on find in the book "Cluster analysis" by
> Brian Everitt, Sabine Landau and Morven Leese.
> They cite, as definition paper for Canberra distance, an article of
> Lance and Williams "Computer programs for hierarchical polythetic
> classification" Computer Journal 1966.
> I do not have access, but here is the link :
> Hope this helps.
>> On 06/02/2010 10:39 AM, Christophe Genolini wrote:
>>> Hi the list,
>>> According to what I know, the Canberra distance between X et Y is :
>>> sum[ (|x_i - y_i|) / (|x_i|+|y_i|) ] (with | | denoting the function
>>> 'absolute value')
>>> In the source code of the canberra distance in the file distance.c,
>>> we find :
>>> sum = fabs(x[i1] + x[i2]);
>>> diff = fabs(x[i1] - x[i2]);
>>> dev = diff/sum;
>>> which correspond to the formula : sum[ (|x_i - y_i|) / (|x_i+y_i|) ]
>>> (note that this does not define a distance... This is correct when
>>> x_i and y_i are positive, but not when a value is negative.)
>>> Is it on purpose or is it a bug?
>> It matches the documentation in ?dist, so it's not just a coding
>> error. It will give the same value as your definition if the two
>> items have the same sign (not only both positive), but different
>> values if the signs differ.
>> The first three links I found searching Google Scholar for "Canberra
>> distance" all define it only for non-negative data. One of them gave
>> exactly the R formula (even though the absolute value in the
>> denominator is redundant), the others just put x_i + y_i in the
>> None of the 3 papers cited the origin of the definition, so I can't
>> tell you who is wrong.
>> Duncan Murdoch
More information about the R-devel