[R] Why does match() treat NaN's as compables; Bug or Feature?
Martin Maechler
maechler at stat.math.ethz.ch
Mon Feb 29 11:14:16 CET 2016
>>>>> Bert Gunter <bgunter.4567 at gmail.com>
>>>>> on Sat, 27 Feb 2016 19:06:05 -0800 writes:
> (on list, since others might not have gotten it either).
> OK, I get it now. It was I who misunderstood.
> But isn't the bug in the **misuse** of match() in ecdf()
> (by failing to specify the nomatch argument). Jeff says
> comparisons with NaN should return an unordered result,
> which NaN is afaics:
>> NaN < 0
> [1] NA
>> NaN > 0
> [1] NA
> match() just does its thing:
>> match(c(NA,NaN),c(1,2,NA,3,4,NaN,5))
> [1] 3 6
> It's up to the caller to use it correctly, which
> apparently ecdf() fails to do.
> Am I missing something here?
not much, if any.
Let me still clarify :
1) This has *nothing* to do with match, and I am confused
why nobody has mentioned this till now.
2) In
x <- c(1,2,NA,3,4,NaN,5)
Fn <- ecdf(x)
there is no error: ecdf() does drop all NA/NaN from its input on purpose
and returns the empirical CDF of the other elements:
so Fn is identical (practically, not strictly formally) to
Fn. <- ecdf(1:5)
3) The bug is really in the underlying C code of approx() / approxfun()
on which ecdf() and notably the function it creates (!)
relies :
> L <- approxfun(1:6, 1:6, method = "constant")
> L( (2:10)/2)
[1] 1 1 2 2 3 3 4 4 5
> L( c(NaN, NA, 2:10)/2)
[1] 5 NA 1 1 2 2 3 3 4 4 5
4) A fix for this bug has been committed to R-devel already, a
a minute ago. [svn rev 70239]
Martin Maechler,
ETH Zurich
> Bert Gunter
> "The trouble with having an open mind is that people keep
> coming along and sticking things into it." -- Opus (aka
> Berkeley Breathed in his "Bloom County" comic strip )
> On Sat, Feb 27, 2016 at 3:49 PM, Jason Thorpe
> <jdthorpe at gmail.com> wrote:
>> The bug is that NaN is not part of any cumulative
>> distribution...
>>
>> -Jason sent from my mobile device
>>
>> On Feb 27, 2016 3:34 PM, "Bert Gunter"
>> <bgunter.4567 at gmail.com> wrote:
>>>
>>> If I understand you correctly, the "bug" is that you do
>>> not understand match(). See inline comment below and
>>> note carefully the "Value" section of ?match.
>>>
>>> Cheers, Bert
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people
>>> keep coming along and sticking things into it." -- Opus
>>> (aka Berkeley Breathed in his "Bloom County" comic strip
>>> )
>>>
>>>
>>> On Sat, Feb 27, 2016 at 2:52 PM, Jason Thorpe
>>> <jdthorpe at gmail.com> wrote: > For some reason `match()`
>>> treats `NaN`'s as comparables by default:
>>> >
>>> >> x <- c(1,2,3,NaN,4,5) >> match(x,x) > [1] 1 2 3 4 5 6
>>> >
>>> > which I can override when using `match()` directly:
>>> >
>>> >> match(x,x,incomparables=NaN) > [1] 1 2 3 NA 5 6
>>> >
>>> > but not necessarily when calling a function that uses
>>> `match()` > internally:
>>> >
>>> >> stats::ecdf(x)(x) > [1] 0.2 0.4 0.6 0.8 0.8 1.0
>>> >
>>> > Obviously there are workarounds for any given
>>> scenario, but the bigger > problem is that this behavior
>>> causes difficult to discover bugs. For > example, the
>>> behavior of stats::ecdf is definitely a bug introduced
>>> by > it's > use of `match()` (unless you think NaN == 4
>>> is correct).
>>>
>>> No, you misunderstand. match() returns the POSITION of
>>> the match, and clearly NaN in the 4th position of table
>>> =x matches NaN in x. e.g.
>>>
>>> > match(c(x,NaN),x) [1] 1 2 3 4 5 6 4
>>>
>>>
>>>
>>> >
>>> > Is there a good reason that NaN's are treated as
>>> comparables by match(), > or > his this a bug?
>>> >
>>> > For reference, I'm using R version 3.2.3
>>> >
>>> > -Jason
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________ >
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
>>> more, see > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide >
>>> http://www.R-project.org/posting-guide.html > and
>>> provide commented, minimal, self-contained, reproducible
>>> code.
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
> more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide
> commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list