[Rd] identical(0, -0)

Fri Aug 7 18:55:50 CEST 2009

On 8/7/2009 11:41 AM, Martin Maechler wrote:
>>>>>> "DM" == Duncan Murdoch <murdoch at stats.uwo.ca>
>>>>>>     on Fri, 07 Aug 2009 11:25:11 -0400 writes:
> 
>     DM> On 8/7/2009 10:46 AM, Martin Maechler wrote:
>     >>>>>>> "TH" == Ted Harding <Ted.Harding at manchester.ac.uk>
>     >>>>>>> on Fri, 07 Aug 2009 14:49:54 +0100 (BST) writes:
>     >> 
>     TH> On 07-Aug-09 11:07:08, Duncan Murdoch wrote:
>     >> >> Martin Maechler wrote:
>     >> >>>>>>>> William Dunlap <wdunlap at tibco.com>
>     >> >>>>>>>> on Thu, 6 Aug 2009 15:06:08 -0700 writes:
>     >> >>> >> -----Original Message----- From:
>     >> >>> >> r-help-bounces at r-project.org
>     >> >>> >> [mailto:r-help-bounces at r-project.org] On Behalf Of
>     >> >>> >> Giovanni Petris Sent: Thursday, August 06, 2009 3:00 PM
>     >> >>> >> To: milton.ruser at gmail.com Cc: r-help at r-project.org;
>     >> >>> >> Daniel.Gerlanc at geodecapital.com Subject: Re: [R] Why is 0
>     >> >>> >> not an integer?
>     >> >>> >> 
>     >> >>> >> 
>     >> >>> >> I ran an instant experiment...
>     >> >>> >> 
>     >> >>> >> > typeof(0) [1] "double" > typeof(-0) [1] "double" >
>     >> >>> >> identical(0, -0) [1] TRUE
>     >> >>> >> 
>     >> >>> >> Best, Giovanni
>     >> >>> 
>     >> >>> > But 0.0 and -0.0 have different reciprocals
>     >> >>> 
>     >> >>> >> 1.0/0.0
>     >> >>> >    [1] Inf
>     >> >>> >> 1.0/-0.0
>     >> >>> >    [1] -Inf
>     >> >>> 
>     >> >>> > Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap
>     >> >>> > tibco.com
>     >> >>> 
>     >> >>> yes.  {finally something interesting in this boring thread !}
>     ---> diverting to R-devel
>     >> >>> 
>     >> >>> In April, I've had a private e-mail communication with John
>     >> >>> Chambers [father of S, notably S4, which also brought identical()]
>     >> >>> and Bill about the topic,
>     >> >>> where I had started suggesting that  R  should be changed such
>     >> >>> that
>     >> >>> identical(-0. , +0.)
>     >> >>> would return FALSE.
>     >> >>> Bill did mention that it does so for (newish versions of) S+
>     >> >>> and that he'd prefer that, too,
>     >> >>> and John said
>     >> >>> 
>     >> >>> >> I agree on having a preference for a bitwise comparison for
>     >> >>> >> identical()---that's what the name means after all.  But since
>     >> >>> >> someone implemented the numerical case as the C == it's probably
>     >> >>> >> going to be more hassle than it's worth to change it.  But we
>     >> >>> >> should make the implementation clear in the documentation.
>     >> >>> 
>     >> >>> so in principle, we all agreed that R's identical() should be
>     >> >>> changed here, namely by using something like  memcmp() instead
>     >> >>> of simple '==' ,  however we haven't bothered to actually 
>     >> >>> *implement* this change.
>     >> >>> 
>     >> >>> I am currently testing a patch  which would lead to
>     >> >>> identical(0, -0)  return FALSE.
>     >> >>> 
>     >> >> I don't think that would be a good idea.  Other expressions besides
>     >> >> "-0" 
>     >> >> calculate the zero with the negative sign bit, e.g. the following
>     >> >> sequence:
>     >> >> 
>     >> >> pos <- 1
>     >> >> neg <- -1
>     >> >> zero <- 0
>     >> >> y <- zero*pos
>     >> >> z <- zero*neg
>     >> >> identical(y, z)
>     >> >> 
>     >> >> I think most R users would expect the last expression there to be
>     >> >> TRUE based on the previous two lines, given that pos and neg both
>     >> >> have finite values. In a simple case like this y == z would be a
>     >> >> better test to use, but if those were components of a larger
>     >> >> structure, identical() is all we've got, and people would waste a
>     >> >> lot of time tracking down why structures differing only in the
>     >> >> sign of zero were not identical, even though every element tested
>     >> >> equal.
>     >> 
>     >> identical()  *is* not the same as '=='  even if you think of a
>     >> generalized '==',
>     >> and your example is not convincing to me.
> 
>     DM> Fair enough, but after your change, how would one do what 
>     DM> identical(list(pos, neg, zero, y), list(pos, neg, zero, z)) does now? 
>     DM> That seems to me to be a more useful comparison than one that declares 
>     DM> those to be unequal because the signs of y and z differ.
> 
> Maybe something like
> 
> all(mapply(`==`,  list(pos, neg, zero, y), list(pos, neg, zero, z)))
> 
> ## or even
> 
> isTRUE(all.equal( list(pos, neg, zero, y), list(pos, neg, zero, z),
>                  tol = 0))

I think I didn't make my point clearly.  I'm not particularly worried 
about lists of numbers, I'm worried about signed zeros buried in complex 
structures.  identical(struc1, struc2) works nicely now for that sort of 
comparison; indeed the man page for it says:

      A call to 'identical' is the way to test exact equality in 'if'
      and 'while' statements, as well as in logical expressions that use
      '&&' or '||'.  In all these applications you need to be assured of
      getting a single logical value.

The description you quote below does contradict this, and it also 
contradicts the statement

      'identical' sees 'NaN' as different from 'NA_real_', but all
      'NaN's are equal (and all 'NA' of the same type are equal).

I think the solution is to fix the man page, not the function.  For 
example, the "_exactly_" seems to be what is upsetting you; I'd suggest 
instead

"The safe and reliable way to test two objects for being equal in 
structure and content.  It returns 'TRUE' in this case, 'FALSE' in every 
other case."

Duncan Murdoch

> 
> the latter of which is more flexible adaptable at what the user
> is really wanting to test.
> 
>     >> Further note that help(identical)  has always said
>     >> 
>     >> > Description:
>     >> 
>     >> >    The safe and reliable way to test two objects for being _exactly_
>     >> >    equal.  It returns 'TRUE' in this case, 'FALSE' in every other case.
>     >> 
>     >> which really should distinguish  -0 and +0
>     >> 
>     >> 
>     >> >> Duncan Murdoch
>     >> >>> Martin Maechler, ETH Zurich
>     >> 
>     TH> My own view of this is that there may in certain cirumstances be an
>     TH> interest in distinguishing between 0 and (-0), yet normally most
>     TH> users will simply want to compare the numerical values.
>     >> 
>     TH> Therefore I am in favour of revising identical() so that it can so
>     TH> distinguish; but also of taking the opportunity to give it a parameter
>     TH> say
>     >> 
>     TH> identical(x,y,sign.bit=FALSE)
>     >> 
>     TH> so that the default behaviour would be to see 0 and (-0) as identical,
>     TH> but with sign.bit=TRUE it would see the difference.
>     >> 
>     TH> However, I put this forward in ignorance of
>     TH> a) Any difficulties that this may present in re-coding identical();
>     TH> b) Any complications that may arise when applying this new form
>     TH> to complex objects.
>     >> 
>     >> Your proposal would actually need to special case this one case,
>     >> rather than my patch  which  replaces  using  '=='   (in C) for
>     >> double by using  memcmp() instead,  something which is already
>     >> used for several other cases there, and hence seems more
>     >> consequent and in that way natural.
>     >> 
>     >> The one thing even the new code would not differentiate is the
>     >> different  NaN's (apart from NA) but they are not differentiable
>     >> on the R level either, AFAIK, at least AFAIU our language
>     >> specifications, we only want two things: NA and NaN
> 
>     DM> I don't understand what you are proposing now.  The different NaN's have 
>     DM> different bit patterns, so wouldn't memcmp() see a difference?  And 
>     DM> taking your literalist point of view, the fact that it is hard to detect 
>     DM> the difference at the R level (requiring C code support to do it) 
>     DM> doesn't mean there is no difference, there's just a very subtle, rarely 
>     DM> detectable difference, like the one between +0 and -0.
> 
>     DM> Duncan Murdoch
> 
>     >> 
>     >> Martin
>     >> 
>     >> ______________________________________________
>     >> R-devel at r-project.org mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-devel