[Rd] Question re: NA, NaNs in R
Kevin Ushey
kevinushey at gmail.com
Mon Feb 10 07:52:13 CET 2014
Hi R-devel,
I have a question about the differentiation between NA and NaN values
as implemented in R. In arithmetic.c, we have
int R_IsNA(double x)
{
if (isnan(x)) {
ieee_double y;
y.value = x;
return (y.word[lw] == 1954);
}
return 0;
}
ieee_double is just used for type punning so we can check the final
bits and see if they're equal to 1954; if they are, x is NA, if
they're not, x is NaN (as defined for R_IsNaN).
My question is -- I can see a substantial increase in speed (on my
computer, in certain cases) if I replace this check with
int R_IsNA(double x)
{
return memcmp(
(char*)(&x),
(char*)(&NA_REAL),
sizeof(double)
) == 0;
}
IIUC, there is only one bit pattern used to encode R NA values, so
this should be safe. But I would like to be sure:
Is there any guarantee that the different functions in R would return
NA as identical to the bit pattern defined for NA_REAL, for a given
architecture? Similarly for NaN value(s) and R_NaN?
My guess is that it is possible some functions used internally by R
might encode NaN values differently; ie, setting the lower word to a
value different than 1954 (hence being NaN, but potentially not
identical to R_NaN), or perhaps this is architecture-dependent.
However, NA should be one specific bit pattern (?). And, I wonder if
there is any guarantee that the different functions used in R would
return an NaN value as identical to R_NaN (which appears to be the
'IEEE NaN')?
(interested parties can see + run a simple benchmark from the gist at
https://gist.github.com/kevinushey/8911432)
Thanks,
Kevin
More information about the R-devel
mailing list