[Rd] 1954 from NA

Mon May 24 12:31:53 CEST 2021

On 5/24/21 11:46 AM, Adrian Dușa wrote:
> On Sun, May 23, 2021 at 10:14 PM Tomas Kalibera 
> <tomas.kalibera using gmail.com <mailto:tomas.kalibera using gmail.com>> wrote:
>
>     [...]
>
>     Good, but unfortunately the delineation between computation and
>     non-computation is not always transparent. Even if an operation
>     doesn't look like "computation" on the high-level, it may
>     internally involve computation - so, really, an R NA can become R
>     NaN and vice versa, at any point (this is not a "feature", but it
>     is how things are now).
>
>
> I see.
> Well, this is a risk we'll have to consider when the time comes. For 
> the moment, storing some metadata within the payload seems to work.
>
>>     [...]
>
>     Ok, then I would probably keep the meta-data on the missing values
>     on the side to implement such missing values in such code, and
>     treat them explicitly in supported operations.
>
>     But. in principle, you can use the floating-point NaN payloads,
>     and you can pass such values to R. You just need to be prepared
>     that not only you would loose your payloads/tags, but also the
>     difference between R NA and R NaNs. Thanks to value semantics of
>     R, you would not loose the tags in input values with proper
>     reference counts (e.g. marked immutable), because those values
>     will not be modified.
>
> NaNs are fine of course, but then some (social science?) users might 
> get confused about the difference between NAs and NaNs, and for this 
> reason only I would still like to preserve the 1954 payload.
> If at all possible, however, the extra 16 bits from this payload would 
> make a whole lot of a difference.
>
> Please forgive my persistence, but would it be possible to use an 
> unsigned short instead of an unsigned int for the 1954 payload?
> That is, if it doesn't break anything, but I don't really see what it 
> could. The corresponding check function seems to work just fine and it 
> doesn't need to be changed at all:
>
> int R_IsNA(double x)
> {
>     if (isnan(x)) {
> ieee_double y;
> y.value = x;
> return (y.word[lw] == 1954);
>     }
>     return 0;
> }

For the reasons I explained, I would be against such a change. Keeping 
the data on the side, as also recommended by others on this list, would 
allow you for a reliable implementation. I don't want to support fragile 
package code building on unspecified R internals, and in this case 
particularly internals that themselves have not stood the test of time, 
so are at high risk of change.

Best
Tomas

>
> Best wishes,
> Adrian
>
>
>

	[[alternative HTML version deleted]]