dist() {"mva" package} bug: treats +/- Inf as NA
Martin Maechler
Martin Maechler <maechler@stat.math.ethz.ch>
Mon, 21 Oct 2002 17:42:19 +0200
Vince Carey found this (thank you!).
Since the fix to the problem is not entirely obvious, I post
this to R-devel as RFC:
help(dist) says:
>> Missing values are allowed, and are excluded from all computations
>> involving the rows within which they occur. If some columns are
>> excluded in calculating a Euclidean, Manhattan or Canberra
>> distance, the sum is scaled up proportionally to the number of
>> columns used. If all pairs are excluded when calculating a
>> particular distance, the value is `NA'.
but the C code in ....../src/library/mva/src/distance.c,
has, e.g. for the Euclidean distance :
count= 0;
dist = 0;
for(j = 0 ; j < nc ; j++) {
if(R_FINITE(x[i1]) && R_FINITE(x[i2])) {
dev = (x[i1] - x[i2]);
dist += dev * dev;
count++;
}
i1 += nr;
i2 += nr;
}
if(count == 0) return NA_REAL;
if(count != nc) dist /= ((double)count/nc);
return sqrt(dist);
where it is clear that "R_FINITE(*)" should in principle be
replaced by "!ISNAN(*)".
Note however that "Inf - Inf -> NaN" and e.g the canberra metric
has more ways to get "NaN".
The current code drops all pairs with an +-Inf in it.
I would be inclined to really replace
if(R_FINITE(x[i1]) && R_FINITE(x[i2])) {
by
if(!ISNAN(x[i1]) && !ISNAN(x[i2])) {
for all metrics -- for R-patched.
But I'd also see reasons where we'd want to be smarter/different
than that. One possibility would be to drop the pair (as
currently) also when both are not finite,
for "binary" to signal an error for +- Inf,
and for "Canberra" :
Of course, d = |x - y| / |x + y|
will be 1 , when one of {x,y} is infinite.
This could be considered the desired answser, or also
we may give a warning in any case.
Opinions?
Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._