[Rd] Canberra distance
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sun Feb 7 07:52:54 CET 2010
This is cetainly ancient R history. The essence of the formula was
last changed
- dist += fabs(x[i1] - x[i2])/(x[i1] + x[i2]);
+ dist += fabs(x[i1] - x[i2])/fabs(x[i1] + x[i2]);
in October 1998. The help page description came later.
The
dist += fabs(x[i1] - x[i2])/(x[i1] + x[i2]);
form was there as 'canberra' in the first CVS archive in September
1997 (as src/library/mva/src/dist.c) so it looks like one of R&R was
the original author and this could be called pre-history.
On Sun, 7 Feb 2010, Bill.Venables at csiro.au wrote:
> That is interesting. The first of these, namely
>
> sum(|x_i - y_i|) / sum(x_i + y_i)
>
> is now better known in ecology as the Bray-Curtis distance. Even more interesting is the typo in Henry & Stevens "A Primer of Ecology in R" where the Bray Curtis distance formula is actually the Canberra distance (Eq. 10.2 p. 289). There seems to be a certain slipperiness of definition in this field.
>
> What surprises me most is why ecologists still cling to this way of doing things, It is one of the few places I know of where the analysis is justified purely heuristically and not from any kind of explicit model for the ecological processes under study.
>
> Bill Venables.
>
>
>
> ________________________________________
> From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org] On Behalf Of Duncan Murdoch [murdoch at stats.uwo.ca]
> Sent: 07 February 2010 03:00
> To: genolini at u-paris10.fr
> Cc: r-devel at r-project.org
> Subject: Re: [Rd] Canberra distance
>
> On 06/02/2010 11:31 AM, Christophe Genolini wrote:
>> The definition I use is the on find in the book "Cluster analysis" by
>> Brian Everitt, Sabine Landau and Morven Leese.
>> They cite, as definition paper for Canberra distance, an article of
>> Lance and Williams "Computer programs for hierarchical polythetic
>> classification" Computer Journal 1966.
>> I do not have access, but here is the link :
>> http://comjnl.oxfordjournals.org/cgi/content/abstract/9/1/60
>> Hope this helps.
>>
>
> I do have access to that journal, and that paper gives the definition
>
> sum(|x_i - y_i|) / sum(x_i + y_i)
>
> and suggests the variation
>
> sum( [|x_i - y_i|) / (x_i + y_i) ] )
>
> It doesn't call either one the Canberra distance; it calls the first one
> the "non-metric coefficient" and doesn't name the second. (I imagine
> the Canberra name came from the fact that the authors were at CSIRO in
> Canberra.)
>
> So I'd agree your definition is better, but I don't know if it can
> really be called the "Canberra distance".
>
> Duncan Murdoch
>
>> Christophe
>>> On 06/02/2010 10:39 AM, Christophe Genolini wrote:
>>>> Hi the list,
>>>>
>>>> According to what I know, the Canberra distance between X et Y is :
>>>> sum[ (|x_i - y_i|) / (|x_i|+|y_i|) ] (with | | denoting the function
>>>> 'absolute value')
>>>> In the source code of the canberra distance in the file distance.c,
>>>> we find :
>>>>
>>>> sum = fabs(x[i1] + x[i2]);
>>>> diff = fabs(x[i1] - x[i2]);
>>>> dev = diff/sum;
>>>>
>>>> which correspond to the formula : sum[ (|x_i - y_i|) / (|x_i+y_i|) ]
>>>> (note that this does not define a distance... This is correct when
>>>> x_i and y_i are positive, but not when a value is negative.)
>>>>
>>>> Is it on purpose or is it a bug?
>>> It matches the documentation in ?dist, so it's not just a coding
>>> error. It will give the same value as your definition if the two
>>> items have the same sign (not only both positive), but different
>>> values if the signs differ.
>>>
>>> The first three links I found searching Google Scholar for "Canberra
>>> distance" all define it only for non-negative data. One of them gave
>>> exactly the R formula (even though the absolute value in the
>>> denominator is redundant), the others just put x_i + y_i in the
>>> denominator.
>>>
>>> None of the 3 papers cited the origin of the definition, so I can't
>>> tell you who is wrong.
>>>
>>> Duncan Murdoch
>>>
>>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list