[Rd] Re: [R] Canberra dist and double zeros
Prof Brian Ripley
ripley@stats.ox.ac.uk
Tue, 6 Mar 2001 09:40:30 +0000 (GMT)
On Tue, 6 Mar 2001, Jari Oksanen wrote:
> ripley@stats.ox.ac.uk said:
> > [Moved to R-devel, as more appropriate.]
>
> This means that I probably have to subsribe (momentarily) for R-devel which I
> have regarded as too technical for non-developer like me.
We'll keep you on the Cc: list. Normally things like this are on R-devel,
as they are specialized.
> ripley@stats.ox.ac.uk said:
> > I am sure we should do something, but is this exactly right?
>
> I am not sure either: it is right for me in my present applications, but I
> think it may not be right in general. I used dist() for community data, where
> zero *is* zero (not only approximately zero floating point number) and means
> that the species is absent, and of course, all numbers are positive or zeros.
> Canberra distance is OK for negative numbers as well, and so x_i = -1, y_1 = 1
> would yield 2/0 which probably shouldn't be regarded as zero, but rather as
> NaN. So a better test would be for above-zero numerator or explicitly for
> both x_i && y_i.
I think it should be Inf, and was going to comment that was another
problem.
> ripley@stats.ox.ac.uk said:
> > The issue is if count should be incremented if sum == 0.0 or not.
>
> I don't know, and I don't have Lance & Williams 1967 to check. However, more
> recent papers by Canberra people do *not* increment count for double-zeros
> (Faith, Minchin, Belbin 1987. Compositional dissimilarity as a robust measure
> of ecological distance. Vegetatio 69, 57-68.). I have no idea about the
> really *correct* solution or what are the arguments for incrementing or not
> incrementing count. At least not incrementing means that count varies with
> pairs of observations instead of being a simple down-scaling by a constant for
> the entire matrix. However, probably the original Lance & Williams choice was
> to increment only for sum > 0.
Note count is only relevant if count < nc, and the code in 1.2.2 is wrong:
it should have been
if(count != nc) dist /= ((double)count/nc);
Fortunately, it was never used.
> Some other people may have better libraries to
> check both the choice and the argument (I may have a look there, but I would
> be surprised if I find Aust. Comput. J. 1, 15-20 here). Checking for
> incrementing count would need testing above-zero denominator which begins to
> look ugly coding if we need testing for numerator as well.
You do anyway to get 2/0 different from 0/0. We can code any solution,
and this is simple and clean compared to, say, scan.c!
I am going to implement that x1=x1=0 is equivalent to missing, and that
x1=+1, x2=-1 gives 2/0 = Inf.
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._