[R] Why does match() treat NaN's as compables; Bug or Feature?

Mon Feb 29 11:14:16 CET 2016

>>>>> Bert Gunter <bgunter.4567 at gmail.com>
>>>>>     on Sat, 27 Feb 2016 19:06:05 -0800 writes:

    > (on list, since others might not have gotten it either).
    > OK, I get it now. It was I who misunderstood.

    > But isn't the bug in the **misuse** of match() in ecdf()
    > (by failing to specify the nomatch argument). Jeff says
    > comparisons with NaN should return an unordered result,
    > which NaN is afaics:

    >> NaN < 0
    > [1] NA
    >> NaN > 0
    > [1] NA

    > match() just does its thing:

    >> match(c(NA,NaN),c(1,2,NA,3,4,NaN,5))
    > [1] 3 6

    > It's up to the caller to use it correctly, which
    > apparently ecdf() fails to do.

    > Am I missing something here?

not much, if any.  

Let me still clarify :

1) This has *nothing* to do with match, and I am confused
   why nobody has mentioned this till now.

2) In

  x <- c(1,2,NA,3,4,NaN,5)
  Fn <- ecdf(x)

 there is no error: ecdf() does drop all NA/NaN from its input on purpose
 and returns the empirical CDF of the other elements:
 so Fn is identical (practically, not strictly formally) to

  Fn. <- ecdf(1:5)

3) The bug is really in the underlying C code of  approx() / approxfun()
   on which ecdf() and notably the function it creates (!)
   relies :

    > L <- approxfun(1:6, 1:6, method = "constant")
    > L( (2:10)/2)
    [1] 1 1 2 2 3 3 4 4 5
    > L( c(NaN, NA, 2:10)/2)
    [1]  5 NA  1  1  2  2  3  3  4  4  5

4) A fix for this bug has been committed to R-devel already, a
   a minute ago. [svn rev 70239]

Martin Maechler,
ETH Zurich

    > Bert Gunter

    > "The trouble with having an open mind is that people keep
    > coming along and sticking things into it."  -- Opus (aka
    > Berkeley Breathed in his "Bloom County" comic strip )

    > On Sat, Feb 27, 2016 at 3:49 PM, Jason Thorpe
    > <jdthorpe at gmail.com> wrote:
    >> The bug is that NaN is not part of any cumulative
    >> distribution...
    >> 
    >> -Jason sent from my mobile device
    >> 
    >> On Feb 27, 2016 3:34 PM, "Bert Gunter"
    >> <bgunter.4567 at gmail.com> wrote:
    >>> 
    >>> If I understand you correctly, the "bug" is that you do
    >>> not understand match(). See inline comment below and
    >>> note carefully the "Value" section of ?match.
    >>> 
    >>> Cheers, Bert
    >>> 
    >>> Bert Gunter
    >>> 
    >>> "The trouble with having an open mind is that people
    >>> keep coming along and sticking things into it."  -- Opus
    >>> (aka Berkeley Breathed in his "Bloom County" comic strip
    >>> )
    >>> 
    >>> 
    >>> On Sat, Feb 27, 2016 at 2:52 PM, Jason Thorpe
    >>> <jdthorpe at gmail.com> wrote: > For some reason `match()`
    >>> treats `NaN`'s as comparables by default:
    >>> >
    >>> >> x <- c(1,2,3,NaN,4,5) >> match(x,x) > [1] 1 2 3 4 5 6
    >>> >
    >>> > which I can override when using `match()` directly:
    >>> >
    >>> >> match(x,x,incomparables=NaN) > [1] 1 2 3 NA 5 6
    >>> >
    >>> > but not necessarily when calling a function that uses
    >>> `match()` > internally:
    >>> >
    >>> >> stats::ecdf(x)(x) > [1] 0.2 0.4 0.6 0.8 0.8 1.0
    >>> >
    >>> > Obviously there are workarounds for any given
    >>> scenario, but the bigger > problem is that this behavior
    >>> causes difficult to discover bugs.  For > example, the
    >>> behavior of stats::ecdf is definitely a bug introduced
    >>> by > it's > use of `match()` (unless you think NaN == 4
    >>> is correct).
    >>> 
    >>> No, you misunderstand. match() returns the POSITION of
    >>> the match, and clearly NaN in the 4th position of table
    >>> =x matches NaN in x. e.g.
    >>> 
    >>> > match(c(x,NaN),x) [1] 1 2 3 4 5 6 4
    >>> 
    >>> 
    >>> 
    >>> >
    >>> > Is there a good reason that NaN's are treated as
    >>> comparables by match(), > or > his this a bug?
    >>> >
    >>> > For reference, I'm using R version 3.2.3
    >>> >
    >>> > -Jason
    >>> >
    >>> > [[alternative HTML version deleted]]
    >>> >
    >>> > ______________________________________________ >
    >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
    >>> more, see > https://stat.ethz.ch/mailman/listinfo/r-help
    >>> > PLEASE do read the posting guide >
    >>> http://www.R-project.org/posting-guide.html > and
    >>> provide commented, minimal, self-contained, reproducible
    >>> code.

    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
    > more, see https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide
    > http://www.R-project.org/posting-guide.html and provide
    > commented, minimal, self-contained, reproducible code.