[Rd] identical(0, -0)
Duncan Murdoch
murdoch at stats.uwo.ca
Fri Aug 7 18:55:50 CEST 2009
On 8/7/2009 11:41 AM, Martin Maechler wrote:
>>>>>> "DM" == Duncan Murdoch <murdoch at stats.uwo.ca>
>>>>>> on Fri, 07 Aug 2009 11:25:11 -0400 writes:
>
> DM> On 8/7/2009 10:46 AM, Martin Maechler wrote:
> >>>>>>> "TH" == Ted Harding <Ted.Harding at manchester.ac.uk>
> >>>>>>> on Fri, 07 Aug 2009 14:49:54 +0100 (BST) writes:
> >>
> TH> On 07-Aug-09 11:07:08, Duncan Murdoch wrote:
> >> >> Martin Maechler wrote:
> >> >>>>>>>> William Dunlap <wdunlap at tibco.com>
> >> >>>>>>>> on Thu, 6 Aug 2009 15:06:08 -0700 writes:
> >> >>> >> -----Original Message----- From:
> >> >>> >> r-help-bounces at r-project.org
> >> >>> >> [mailto:r-help-bounces at r-project.org] On Behalf Of
> >> >>> >> Giovanni Petris Sent: Thursday, August 06, 2009 3:00 PM
> >> >>> >> To: milton.ruser at gmail.com Cc: r-help at r-project.org;
> >> >>> >> Daniel.Gerlanc at geodecapital.com Subject: Re: [R] Why is 0
> >> >>> >> not an integer?
> >> >>> >>
> >> >>> >>
> >> >>> >> I ran an instant experiment...
> >> >>> >>
> >> >>> >> > typeof(0) [1] "double" > typeof(-0) [1] "double" >
> >> >>> >> identical(0, -0) [1] TRUE
> >> >>> >>
> >> >>> >> Best, Giovanni
> >> >>>
> >> >>> > But 0.0 and -0.0 have different reciprocals
> >> >>>
> >> >>> >> 1.0/0.0
> >> >>> > [1] Inf
> >> >>> >> 1.0/-0.0
> >> >>> > [1] -Inf
> >> >>>
> >> >>> > Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap
> >> >>> > tibco.com
> >> >>>
> >> >>> yes. {finally something interesting in this boring thread !}
> ---> diverting to R-devel
> >> >>>
> >> >>> In April, I've had a private e-mail communication with John
> >> >>> Chambers [father of S, notably S4, which also brought identical()]
> >> >>> and Bill about the topic,
> >> >>> where I had started suggesting that R should be changed such
> >> >>> that
> >> >>> identical(-0. , +0.)
> >> >>> would return FALSE.
> >> >>> Bill did mention that it does so for (newish versions of) S+
> >> >>> and that he'd prefer that, too,
> >> >>> and John said
> >> >>>
> >> >>> >> I agree on having a preference for a bitwise comparison for
> >> >>> >> identical()---that's what the name means after all. But since
> >> >>> >> someone implemented the numerical case as the C == it's probably
> >> >>> >> going to be more hassle than it's worth to change it. But we
> >> >>> >> should make the implementation clear in the documentation.
> >> >>>
> >> >>> so in principle, we all agreed that R's identical() should be
> >> >>> changed here, namely by using something like memcmp() instead
> >> >>> of simple '==' , however we haven't bothered to actually
> >> >>> *implement* this change.
> >> >>>
> >> >>> I am currently testing a patch which would lead to
> >> >>> identical(0, -0) return FALSE.
> >> >>>
> >> >> I don't think that would be a good idea. Other expressions besides
> >> >> "-0"
> >> >> calculate the zero with the negative sign bit, e.g. the following
> >> >> sequence:
> >> >>
> >> >> pos <- 1
> >> >> neg <- -1
> >> >> zero <- 0
> >> >> y <- zero*pos
> >> >> z <- zero*neg
> >> >> identical(y, z)
> >> >>
> >> >> I think most R users would expect the last expression there to be
> >> >> TRUE based on the previous two lines, given that pos and neg both
> >> >> have finite values. In a simple case like this y == z would be a
> >> >> better test to use, but if those were components of a larger
> >> >> structure, identical() is all we've got, and people would waste a
> >> >> lot of time tracking down why structures differing only in the
> >> >> sign of zero were not identical, even though every element tested
> >> >> equal.
> >>
> >> identical() *is* not the same as '==' even if you think of a
> >> generalized '==',
> >> and your example is not convincing to me.
>
> DM> Fair enough, but after your change, how would one do what
> DM> identical(list(pos, neg, zero, y), list(pos, neg, zero, z)) does now?
> DM> That seems to me to be a more useful comparison than one that declares
> DM> those to be unequal because the signs of y and z differ.
>
> Maybe something like
>
> all(mapply(`==`, list(pos, neg, zero, y), list(pos, neg, zero, z)))
>
> ## or even
>
> isTRUE(all.equal( list(pos, neg, zero, y), list(pos, neg, zero, z),
> tol = 0))
I think I didn't make my point clearly. I'm not particularly worried
about lists of numbers, I'm worried about signed zeros buried in complex
structures. identical(struc1, struc2) works nicely now for that sort of
comparison; indeed the man page for it says:
A call to 'identical' is the way to test exact equality in 'if'
and 'while' statements, as well as in logical expressions that use
'&&' or '||'. In all these applications you need to be assured of
getting a single logical value.
The description you quote below does contradict this, and it also
contradicts the statement
'identical' sees 'NaN' as different from 'NA_real_', but all
'NaN's are equal (and all 'NA' of the same type are equal).
I think the solution is to fix the man page, not the function. For
example, the "_exactly_" seems to be what is upsetting you; I'd suggest
instead
"The safe and reliable way to test two objects for being equal in
structure and content. It returns 'TRUE' in this case, 'FALSE' in every
other case."
Duncan Murdoch
>
> the latter of which is more flexible adaptable at what the user
> is really wanting to test.
>
> >> Further note that help(identical) has always said
> >>
> >> > Description:
> >>
> >> > The safe and reliable way to test two objects for being _exactly_
> >> > equal. It returns 'TRUE' in this case, 'FALSE' in every other case.
> >>
> >> which really should distinguish -0 and +0
> >>
> >>
> >> >> Duncan Murdoch
> >> >>> Martin Maechler, ETH Zurich
> >>
> TH> My own view of this is that there may in certain cirumstances be an
> TH> interest in distinguishing between 0 and (-0), yet normally most
> TH> users will simply want to compare the numerical values.
> >>
> TH> Therefore I am in favour of revising identical() so that it can so
> TH> distinguish; but also of taking the opportunity to give it a parameter
> TH> say
> >>
> TH> identical(x,y,sign.bit=FALSE)
> >>
> TH> so that the default behaviour would be to see 0 and (-0) as identical,
> TH> but with sign.bit=TRUE it would see the difference.
> >>
> TH> However, I put this forward in ignorance of
> TH> a) Any difficulties that this may present in re-coding identical();
> TH> b) Any complications that may arise when applying this new form
> TH> to complex objects.
> >>
> >> Your proposal would actually need to special case this one case,
> >> rather than my patch which replaces using '==' (in C) for
> >> double by using memcmp() instead, something which is already
> >> used for several other cases there, and hence seems more
> >> consequent and in that way natural.
> >>
> >> The one thing even the new code would not differentiate is the
> >> different NaN's (apart from NA) but they are not differentiable
> >> on the R level either, AFAIK, at least AFAIU our language
> >> specifications, we only want two things: NA and NaN
>
> DM> I don't understand what you are proposing now. The different NaN's have
> DM> different bit patterns, so wouldn't memcmp() see a difference? And
> DM> taking your literalist point of view, the fact that it is hard to detect
> DM> the difference at the R level (requiring C code support to do it)
> DM> doesn't mean there is no difference, there's just a very subtle, rarely
> DM> detectable difference, like the one between +0 and -0.
>
> DM> Duncan Murdoch
>
> >>
> >> Martin
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list