[Rd] identical(0, -0)

Sat Aug 8 15:04:07 CEST 2009

>>>>> "DM" == Duncan Murdoch <murdoch at stats.uwo.ca>
>>>>>     on Fri, 07 Aug 2009 12:55:50 -0400 writes:

    DM> On 8/7/2009 11:41 AM, Martin Maechler wrote:
    >>>>>>> "DM" == Duncan Murdoch <murdoch at stats.uwo.ca>
    >>>>>>> on Fri, 07 Aug 2009 11:25:11 -0400 writes:
    >> 
    DM> On 8/7/2009 10:46 AM, Martin Maechler wrote:
    >> >>>>>>> "TH" == Ted Harding <Ted.Harding at manchester.ac.uk>
    >> >>>>>>> on Fri, 07 Aug 2009 14:49:54 +0100 (BST) writes:
    >> >> 
    TH> On 07-Aug-09 11:07:08, Duncan Murdoch wrote:
    >> >> >> Martin Maechler wrote:
    >> >> >>>>>>>> William Dunlap <wdunlap at tibco.com>
    >> >> >>>>>>>> on Thu, 6 Aug 2009 15:06:08 -0700 writes:
    >> >> >>> >> -----Original Message----- From:
    >> >> >>> >> r-help-bounces at r-project.org
    >> >> >>> >> [mailto:r-help-bounces at r-project.org] On Behalf Of
    >> >> >>> >> Giovanni Petris Sent: Thursday, August 06, 2009 3:00 PM
    >> >> >>> >> To: milton.ruser at gmail.com Cc: r-help at r-project.org;
    >> >> >>> >> Daniel.Gerlanc at geodecapital.com Subject: Re: [R] Why is 0
    >> >> >>> >> not an integer?
    >> >> >>> >> 
    >> >> >>> >> 
    >> >> >>> >> I ran an instant experiment...
    >> >> >>> >> 
    >> >> >>> >> > typeof(0) [1] "double" > typeof(-0) [1] "double" >
    >> >> >>> >> identical(0, -0) [1] TRUE
    >> >> >>> >> 
    >> >> >>> >> Best, Giovanni
    >> >> >>> 
    >> >> >>> > But 0.0 and -0.0 have different reciprocals
    >> >> >>> 
    >> >> >>> >> 1.0/0.0
    >> >> >>> >    [1] Inf
    >> >> >>> >> 1.0/-0.0
    >> >> >>> >    [1] -Inf
    >> >> >>> 
    >> >> >>> > Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap
    >> >> >>> > tibco.com
    >> >> >>> 
    >> >> >>> yes.  {finally something interesting in this boring thread !}
    ---> diverting to R-devel
    >> >> >>> 
    >> >> >>> In April, I've had a private e-mail communication with John
    >> >> >>> Chambers [father of S, notably S4, which also brought identical()]
    >> >> >>> and Bill about the topic,
    >> >> >>> where I had started suggesting that  R  should be changed such
    >> >> >>> that
    >> >> >>> identical(-0. , +0.)
    >> >> >>> would return FALSE.
    >> >> >>> Bill did mention that it does so for (newish versions of) S+
    >> >> >>> and that he'd prefer that, too,
    >> >> >>> and John said
    >> >> >>> 
    >> >> >>> >> I agree on having a preference for a bitwise comparison for
    >> >> >>> >> identical()---that's what the name means after all.  But since
    >> >> >>> >> someone implemented the numerical case as the C == it's probably
    >> >> >>> >> going to be more hassle than it's worth to change it.  But we
    >> >> >>> >> should make the implementation clear in the documentation.
    >> >> >>> 
    >> >> >>> so in principle, we all agreed that R's identical() should be
    >> >> >>> changed here, namely by using something like  memcmp() instead
    >> >> >>> of simple '==' ,  however we haven't bothered to actually 
    >> >> >>> *implement* this change.
    >> >> >>> 
    >> >> >>> I am currently testing a patch  which would lead to
    >> >> >>> identical(0, -0)  return FALSE.
    >> >> >>> 
    >> >> >> I don't think that would be a good idea.  Other expressions besides
    >> >> >> "-0" 
    >> >> >> calculate the zero with the negative sign bit, e.g. the following
    >> >> >> sequence:
    >> >> >> 
    >> >> >> pos <- 1
    >> >> >> neg <- -1
    >> >> >> zero <- 0
    >> >> >> y <- zero*pos
    >> >> >> z <- zero*neg
    >> >> >> identical(y, z)
    >> >> >> 
    >> >> >> I think most R users would expect the last expression there to be
    >> >> >> TRUE based on the previous two lines, given that pos and neg both
    >> >> >> have finite values. In a simple case like this y == z would be a
    >> >> >> better test to use, but if those were components of a larger
    >> >> >> structure, identical() is all we've got, and people would waste a
    >> >> >> lot of time tracking down why structures differing only in the
    >> >> >> sign of zero were not identical, even though every element tested
    >> >> >> equal.
    >> >> 
    >> >> identical()  *is* not the same as '=='  even if you think of a
    >> >> generalized '==',
    >> >> and your example is not convincing to me.
    >> 
    DM> Fair enough, but after your change, how would one do what 
    DM> identical(list(pos, neg, zero, y), list(pos, neg, zero, z)) does now? 
    DM> That seems to me to be a more useful comparison than one that declares 
    DM> those to be unequal because the signs of y and z differ.
    >> 
    >> Maybe something like
    >> 
    >> all(mapply(`==`,  list(pos, neg, zero, y), list(pos, neg, zero, z)))
    >> 
    >> ## or even
    >> 
    >> isTRUE(all.equal( list(pos, neg, zero, y), list(pos, neg, zero, z),
    >> tol = 0))

    DM> I think I didn't make my point clearly.  I'm not particularly worried 
    DM> about lists of numbers, I'm worried about signed zeros buried in complex 
    DM> structures.  identical(struc1, struc2) works nicely now for that sort of 
    DM> comparison; indeed the man page for it says:

and so does isTRUE(all.equal(..)) as given above.

For me, all your arguments point to all.equal(..., tol=0)

    DM> indeed the man page for it says:

    DM> A call to 'identical' is the way to test exact equality in 'if'
    DM> and 'while' statements, as well as in logical expressions that use
    DM> '&&' or '||'.  In all these applications you need to be assured of
    DM> getting a single logical value.

Yes, note the word "exact" ..
but see below

    DM> The description you quote below does contradict this, and it also 
    DM> contradicts the statement

    DM> 'identical' sees 'NaN' as different from 'NA_real_', but all
    DM> 'NaN's are equal (and all 'NA' of the same type are equal).

which makes sense as I think they cannot be distinguished by R,
but even here, I could think of case where I'd like identical()
to be less lenient....  
Maybe we should think of a 3rd optional argument, along the
lines Ted suggested (but with a different default than his..).

    DM> I think the solution is to fix the man page, not the
    DM> function.  

NO !!!!!

As I said very early:

identical() was introduced with S4, ca. 1998, by John Chambers.
The DESCRIPTION above is really what it should do !

In Splus 5.1 { 1999 }, one of the earliest publicly available
versions of S4,
    identical(0. , -0.)    already gives FALSE.

identical() was introduced into R for 1.4.0, spring 2002,
and given the above, it just always never did what it should
have, and of course, that bug / problem  *is*  very rare and
typically not very consequential and so we all have lived with
that buglet for 7 years...

Can you give a *real* {not contrived} example where the old use
was important?  Do you know of cases where users used
identical()  in cases they should have used  all.equal(*, tol=0)?

Maybe we should introduce a function that's basically
isTRUE(all.equal(..., tol=0))  {but faster},  or
do you want a 3rd argument to identical, say 'method'
with default  c("oneNaN", "use.==", "strict")

oneNaN: my proposal of using  memcmp() on doubles as its used for
       other types already  (and hence distinguishing +0 and -0;
     otherwise keeping the feature that there's just one NaN
     which differs from 'NA' (and there's just one 'NA').

use.==: the previous R behaviour, using '==' on doubles 
	(and the "oneNaN" behavior)

strict: be even stricter than oneNaN:  Use  memcmp()
	unconditionally for doubles.  This would be the fastest
	version of all three.

    DM> For 
    DM> example, the "_exactly_" seems to be what is upsetting you; I'd suggest 
    DM> instead

    DM> "The safe and reliable way to test two objects for being equal in 
    DM> structure and content.  It returns 'TRUE' in this case, 'FALSE' in every 
    DM> other case."

I don't think so, not at all.
That would rather be a description of   isTRUE(all.equal(..., tol=0))

    DM> Duncan Murdoch

    >> 
    >> the latter of which is more flexible adaptable at what the user
    >> is really wanting to test.
    >> 
    >> >> Further note that help(identical)  has always said
    >> >> 
    >> >> > Description:
    >> >> 
    >> >> >    The safe and reliable way to test two objects for being _exactly_
    >> >> >    equal.  It returns 'TRUE' in this case, 'FALSE' in every other case.
    >> >> 
    >> >> which really should distinguish  -0 and +0
    >> >> 
    >> >> 
    >> >> >> Duncan Murdoch
    >> >> >>> Martin Maechler, ETH Zurich
    >> >> 
    TH> My own view of this is that there may in certain cirumstances be an
    TH> interest in distinguishing between 0 and (-0), yet normally most
    TH> users will simply want to compare the numerical values.
    >> >> 
    TH> Therefore I am in favour of revising identical() so that it can so
    TH> distinguish; but also of taking the opportunity to give it a parameter
    TH> say
    >> >> 
    TH> identical(x,y,sign.bit=FALSE)
    >> >> 
    TH> so that the default behaviour would be to see 0 and (-0) as identical,
    TH> but with sign.bit=TRUE it would see the difference.
    >> >> 
    TH> However, I put this forward in ignorance of
    TH> a) Any difficulties that this may present in re-coding identical();
    TH> b) Any complications that may arise when applying this new form
    TH> to complex objects.
    >> >> 
    >> >> Your proposal would actually need to special case this one case,
    >> >> rather than my patch  which  replaces  using  '=='   (in C) for
    >> >> double by using  memcmp() instead,  something which is already
    >> >> used for several other cases there, and hence seems more
    >> >> consequent and in that way natural.
    >> >> 
    >> >> The one thing even the new code would not differentiate is the
    >> >> different  NaN's (apart from NA) but they are not differentiable
    >> >> on the R level either, AFAIK, at least AFAIU our language
    >> >> specifications, we only want two things: NA and NaN
    >> 
    DM> I don't understand what you are proposing now.  The different NaN's have 
    DM> different bit patterns, so wouldn't memcmp() see a difference?  And 
    DM> taking your literalist point of view, the fact that it is hard to detect 
    DM> the difference at the R level (requiring C code support to do it) 
    DM> doesn't mean there is no difference, there's just a very subtle, rarely 
    DM> detectable difference, like the one between +0 and -0.
    >> 
    DM> Duncan Murdoch
    >> 
    >> >> 
    >> >> Martin
    >> >> 
    >> >> ______________________________________________
    >> >> R-devel at r-project.org mailing list
    >> >> https://stat.ethz.ch/mailman/listinfo/r-devel

    DM> ______________________________________________
    DM> R-devel at r-project.org mailing list
    DM> https://stat.ethz.ch/mailman/listinfo/r-devel