[Rd] Suggestion: add a warning in the help-file of unique()

(Ted Harding) Ted.Harding at manchester.ac.uk
Thu Apr 17 15:54:22 CEST 2008


On 17-Apr-08 10:44:32, Matthieu Stigler wrote:
> Hello
> 
> I'm sorry if this suggestion/correction was already made
> but after a search in devel list I did not find any mention
> of it. I would just suggest to add a warning or an exemple
> for the help-file of the function unique() like
> 
> "Note that unique() compares only identical values. Values
> which, are printed equally but in facts are not identical
> will be treated as different."
> 
> 
>  > a<-c(0.2, 0.3, 0.2, 0.4-0.1)
>  > a
> [1] 0.2 0.3 0.2 0.3
>  > unique(a)
> [1] 0.2 0.3 0.3
> 
> Well this is just the idea and the sentence could be made better
> (my poor english...). Maybe a reference to RFAQ 7.31 could be made.
> Maybe is this behaviour clear and logical for experienced users,
> but I don't think it is for beginners. I personnaly spent two
> hours to see that the problem in my code came from this.

The above is potentially a useful suggestion, and I would be
inclined to support it. However, for your other suggestion:

> I was thinking about modify the function unique() to introduce
> a "tol" argument which allows to compare with a tolerance level
> (with default value zero to keep unique consistent) like all.equal(),
> but it seemed too complicated with my little understanding.
> 
> Bests regards and many thanks for what you do for R!
> Matthieu Stigler

What is really complicated about it is that the results may
depend on the order of elements. When unique() eliminates only
values which are strictly identical to values which have been
scanned earlier, there is no problem.

But suppose you set "tol=0.11" in

unique(c(20.0, 30.0, 30.1, 30.2, 40.0)
# 20.0, 30.0, 40
[30.1 rejected because within 0.11 of previous 30.0;
 30.2 rejected because within 0.11 of previous 30.1]
and compare with

unique(c(20.0, 30.0, 30.2, 30.1, 40.0)
# 20.0, 30.0, 30.2, 40.0
[30.2 accepted because not within 0.11 of any previous;
 30.1 rejected because within 0.11 of previous 30.2 or 30.0]

This kind of problem is always present in situations where
there are potential "chained tolerances".

You cannot see the difference between the position of the
hour-hand of a clock now, and one minute later.

But you may not chain this logic, for, if you could:

If A is indistinguishable from B, and B is indistinguishable
  from C, then A is indistinguishable from C.

10:00 is indistinguishable from 10:01 (on the hour-hand)
10:[n] is indistinguishable from 10:[n+1]

Hence, by induction, 10:00 is indistinguishable from 11:00

Which you do not want!

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Apr-08                                       Time: 14:54:19
------------------------------ XFMail ------------------------------



More information about the R-devel mailing list