[Rd] identical(0, -0)

Tue Aug 11 12:02:01 CEST 2009

On Tue, Aug 11, 2009 at 10:04:20AM +0200, Martin Maechler wrote:
> >>>>> "DM" == Duncan Murdoch <murdoch at stats.uwo.ca>
> >>>>>     on Mon, 10 Aug 2009 11:51:53 -0400 writes:
> 
>     DM> For people who want to play with these, here are some functions that let 
>     DM> you get or set the "payload" value in a NaN.  NaN and NA, Inf and -Inf 
>     DM> are stored quite similarly; these functions don't distinguish which of 
>     DM> those you're working with.  Regular finite values give NA for the 
>     DM> payload value, and elements of x are unchanged if you try to set their 
>     DM> payload to NA.
> 
>     DM> By the way, this also shows that R *can* distinguish different NaN 
>     DM> values, but you need some byte-level manipulations.
> 
> yes;  very nice code, indeed!
> 
> I propose a version of the  showBytes()  utility should be added 
> either as an example e.g. in  writeBin() or even an exported
> function in package 'utils'
> 
>  [.........]
> 
>     > Example:
> 
>     >> x <- c(NA, NaN, 0, 1, Inf)
>     >> NaNpayload(x)
>     > [1]  0.5 -0.5   NA   NA  0.0
> 
> Interestingly, on 64-bit, I get a slightly different answer above, 
> (when all the following code gives exactly the same results,
>  and of course, that was your main point !), namely
> 4.338752e-13 instead of 0.5  for 'NA', 
> see below.
> 
> .. and your nice tools also let me detect an even simpler way
> to get *two* versions of NA, and NaN, each :  
> Conclusion:  Both  NaN  and  NA (well NA_real_) have a sign, too !
> 
> NaNpayload(NA_real_)
> ##[1] 4.338752e-13
> NaNpayload(-NA_real_)
> ##[1] -4.338752e-13   ## !! different
> 
> str(NApm <- c(1[2], -1[2]))
> t(sapply(NApm, showBytes))
> ## [1,]   a2   07   00   00   00   00   f0*   7f
> ## [2,]   a2   07   00   00   00   00   f0*   ff
> 
> ## or summarizing things :
> 
> ## Or, "in summary" -- Duncan's original example slightly extended:
> x <- c(NaN, -NaN, NA, -NA_real_, 0, 0.1, Inf, -Inf)
> x
> names(x) <- format(x)
> sapply(x, showBytes)
> ##       NaN  NaN   NA   NA  0.0  0.1  Inf -Inf
> ## [1,]   00   00   a2   a2   00   9a   00   00
> ## [2,]   00   00   07   07   00   99   00   00
> ## [3,]   00   00   00   00   00   99   00   00
> ## [4,]   00   00   00   00   00   99   00   00
> ## [5,]   00   00   00   00   00   99   00   00
> ## [6,]   00   00   00   00   00   99   00   00
> ## [7,]   f8   f8   f8*  f8*  00   b9   f0   f0
> ## [8,]   ff   7f   7f   ff   00   3f   7f   ff
> 
> ## (*) NOTE: the  'f0*' or 'f8*' above  are
> ## ---       'f8' on 32-bit,  'f0' on 64-bit
> 
> 
> 
>     >> NaNpayload(x) <- -0.4
>     >> x
>     > [1] NaN NaN NaN NaN NaN
>     >> y <- x
>     >> NaNpayload(y) <- 0.6
>     >> y
>     > [1] NaN NaN NaN NaN NaN
>     >> NaNpayload(x)
>     > [1] -0.4 -0.4 -0.4 -0.4 -0.4
>     >> NaNpayload(y)
>     > [1] 0.6 0.6 0.6 0.6 0.6
>     >> identical(x, y)
>     > [1] TRUE
> 

The above examples convince me that the default behavior of identical()
should not be based on bit patterns, since the differences between 
different NaN's or even different NA's are irrelevant except if we
use the bit manipulations explicitly.

Let me suggest the following short description in ?identical

  The safe and reliable way to test two objects for being equal in
  structure, types of components and their values. It returns 'TRUE' in
  this case, 'FALSE' in every other case.

and replacing the paragraph

   'identical' sees 'NaN' as different from 'NA_real_', but all
   'NaN's are equal (and all 'NA' of the same type are equal).

in ?identical by

  Comparison of objects of numeric type uses '==' for comparison of their
  components. This means that the values of the components rather
  than their machine representation is compared. In particular,
  '0' and '-0' are considered equal, all 'NA's of the same type are
  equal and all 'NaN's are equal, although their bit patterns may 
  differ in some cases. 'NA' and 'NaN' are always different. Note 
  also that 1/0 and 1/(-0) are different.

Petr.