[Rd] Canberra distance and binary distance

Christophe Genolini cgenolin at u-paris10.fr
Sat Feb 6 17:49:14 CET 2010


I guess there is also a problem in the binary distance since

x <- y <- rep(0,10)
dist(rbind(x,y),method="binary")

gives 0 whereas it suppose to be undefine. (the aka asymmetric binary is 
not suppose to take in account the (off,off) couples in its calculation)

Christophe

> The definition I use is the on find in the book "Cluster analysis" by 
> Brian Everitt, Sabine Landau and Morven Leese.
> They cite, as definition paper for Canberra distance, an article of 
> Lance and Williams "Computer programs for hierarchical polythetic 
> classification" Computer Journal 1966.
> I do not have access, but here is the link : 
> http://comjnl.oxfordjournals.org/cgi/content/abstract/9/1/60
> Hope this helps.
>
> Christophe
>> On 06/02/2010 10:39 AM, Christophe Genolini wrote:
>>> Hi the list,
>>>
>>> According to what I know, the Canberra distance between X et Y is : 
>>> sum[ (|x_i - y_i|) / (|x_i|+|y_i|) ] (with | | denoting the function 
>>> 'absolute value')
>>> In the source code of the canberra distance in the file distance.c, 
>>> we find :
>>>
>>>     sum = fabs(x[i1] + x[i2]);
>>>     diff = fabs(x[i1] - x[i2]);
>>>     dev = diff/sum;
>>>
>>> which correspond to the formula : sum[ (|x_i - y_i|) / (|x_i+y_i|) ]
>>> (note that this does not define a distance... This is correct when 
>>> x_i and y_i are positive, but not when a value is negative.)
>>>
>>> Is it on purpose or is it a bug?
>>
>> It matches the documentation in ?dist, so it's not just a coding 
>> error.  It will give the same value as your definition if the two 
>> items have the same sign (not only both positive), but different 
>> values if the signs differ.
>>
>> The first three links I found searching Google Scholar for "Canberra 
>> distance" all define it only for non-negative data.  One of them gave 
>> exactly the R formula (even though the absolute value in the 
>> denominator is redundant), the others just put x_i + y_i in the 
>> denominator.
>>
>> None of the 3 papers cited the origin of the definition, so I can't 
>> tell you who is wrong.
>>
>> Duncan Murdoch
>>
>>
>
>



More information about the R-devel mailing list