[Rd] identical(0, -0)
Petr Savicky
savicky at cs.cas.cz
Tue Aug 11 12:02:01 CEST 2009
On Tue, Aug 11, 2009 at 10:04:20AM +0200, Martin Maechler wrote:
> >>>>> "DM" == Duncan Murdoch <murdoch at stats.uwo.ca>
> >>>>> on Mon, 10 Aug 2009 11:51:53 -0400 writes:
>
> DM> For people who want to play with these, here are some functions that let
> DM> you get or set the "payload" value in a NaN. NaN and NA, Inf and -Inf
> DM> are stored quite similarly; these functions don't distinguish which of
> DM> those you're working with. Regular finite values give NA for the
> DM> payload value, and elements of x are unchanged if you try to set their
> DM> payload to NA.
>
> DM> By the way, this also shows that R *can* distinguish different NaN
> DM> values, but you need some byte-level manipulations.
>
> yes; very nice code, indeed!
>
> I propose a version of the showBytes() utility should be added
> either as an example e.g. in writeBin() or even an exported
> function in package 'utils'
>
> [.........]
>
> > Example:
>
> >> x <- c(NA, NaN, 0, 1, Inf)
> >> NaNpayload(x)
> > [1] 0.5 -0.5 NA NA 0.0
>
> Interestingly, on 64-bit, I get a slightly different answer above,
> (when all the following code gives exactly the same results,
> and of course, that was your main point !), namely
> 4.338752e-13 instead of 0.5 for 'NA',
> see below.
>
> .. and your nice tools also let me detect an even simpler way
> to get *two* versions of NA, and NaN, each :
> Conclusion: Both NaN and NA (well NA_real_) have a sign, too !
>
> NaNpayload(NA_real_)
> ##[1] 4.338752e-13
> NaNpayload(-NA_real_)
> ##[1] -4.338752e-13 ## !! different
>
> str(NApm <- c(1[2], -1[2]))
> t(sapply(NApm, showBytes))
> ## [1,] a2 07 00 00 00 00 f0* 7f
> ## [2,] a2 07 00 00 00 00 f0* ff
>
> ## or summarizing things :
>
> ## Or, "in summary" -- Duncan's original example slightly extended:
> x <- c(NaN, -NaN, NA, -NA_real_, 0, 0.1, Inf, -Inf)
> x
> names(x) <- format(x)
> sapply(x, showBytes)
> ## NaN NaN NA NA 0.0 0.1 Inf -Inf
> ## [1,] 00 00 a2 a2 00 9a 00 00
> ## [2,] 00 00 07 07 00 99 00 00
> ## [3,] 00 00 00 00 00 99 00 00
> ## [4,] 00 00 00 00 00 99 00 00
> ## [5,] 00 00 00 00 00 99 00 00
> ## [6,] 00 00 00 00 00 99 00 00
> ## [7,] f8 f8 f8* f8* 00 b9 f0 f0
> ## [8,] ff 7f 7f ff 00 3f 7f ff
>
> ## (*) NOTE: the 'f0*' or 'f8*' above are
> ## --- 'f8' on 32-bit, 'f0' on 64-bit
>
>
>
> >> NaNpayload(x) <- -0.4
> >> x
> > [1] NaN NaN NaN NaN NaN
> >> y <- x
> >> NaNpayload(y) <- 0.6
> >> y
> > [1] NaN NaN NaN NaN NaN
> >> NaNpayload(x)
> > [1] -0.4 -0.4 -0.4 -0.4 -0.4
> >> NaNpayload(y)
> > [1] 0.6 0.6 0.6 0.6 0.6
> >> identical(x, y)
> > [1] TRUE
>
The above examples convince me that the default behavior of identical()
should not be based on bit patterns, since the differences between
different NaN's or even different NA's are irrelevant except if we
use the bit manipulations explicitly.
Let me suggest the following short description in ?identical
The safe and reliable way to test two objects for being equal in
structure, types of components and their values. It returns 'TRUE' in
this case, 'FALSE' in every other case.
and replacing the paragraph
'identical' sees 'NaN' as different from 'NA_real_', but all
'NaN's are equal (and all 'NA' of the same type are equal).
in ?identical by
Comparison of objects of numeric type uses '==' for comparison of their
components. This means that the values of the components rather
than their machine representation is compared. In particular,
'0' and '-0' are considered equal, all 'NA's of the same type are
equal and all 'NaN's are equal, although their bit patterns may
differ in some cases. 'NA' and 'NaN' are always different. Note
also that 1/0 and 1/(-0) are different.
Petr.
More information about the R-devel
mailing list