[Rd] Spurious warnings in coercion from double/complex/character to raw
Hervé Pagès
hp@ge@@on@g|thub @end|ng |rom gm@||@com
Fri Sep 10 21:13:30 CEST 2021
On 10/09/2021 09:12, Duncan Murdoch wrote:
> On 10/09/2021 11:29 a.m., Hervé Pagès wrote:
>> Hi,
>>
>> The first warning below is unexpected and confusing:
>>
>> > as.raw(c(3e9, 5.1))
>> [1] 00 05
>> Warning messages:
>> 1: NAs introduced by coercion to integer range
>> 2: out-of-range values treated as 0 in coercion to raw
>>
>> The reason we get it is that coercion from numeric to raw is currently
>> implemented on top of coercion from numeric to int (file
>> src/main/coerce.c, lines 700-710):
>>
>> case REALSXP:
>> for (i = 0; i < n; i++) {
>> // if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
>> tmp = IntegerFromReal(REAL_ELT(v, i), &warn);
>> if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) {
>> tmp = 0;
>> warn |= WARN_RAW;
>> }
>> pa[i] = (Rbyte) tmp;
>> }
>> break;
>>
>> The first warning comes from the call to IntegerFromReal().
>>
>> The following code avoids the spurious warning and is also simpler and
>> slightly faster:
>>
>> case REALSXP:
>> for (i = 0; i < n; i++) {
>> // if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
>> double vi = REAL_ELT(v, i);
>> if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) {
>> tmp = 0;
>> warn |= WARN_RAW;
>> }
>> pa[i] = (Rbyte) tmp;
>> }
>> break;
>
> Doesn't that give different results in case vi is so large that "(int)
> vi" overflows? (I don't know what the C standard says, but some online
> references say that behaviour is implementation dependent.)
>
> For example, if
>
> vi = 1.0 + INT_MAX;
>
> wouldn't "(int) vi" be equal to a small integer?
Good catch, thanks!
Replacing
if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) {
tmp = 0;
warn |= WARN_RAW;
}
pa[i] = (Rbyte) tmp;
with
if(ISNAN(vi) || vi <= -1.0 || vi >= 256.0)
{
tmp = 0;
warn |= WARN_RAW;
} else {
tmp = (int) vi;
}
pa[i] = (Rbyte) tmp;
should address that.
FWIW IntegerFromReal() has a similar risk of int overflow
(src/main/coerce.c, lines 128-138):
int attribute_hidden
IntegerFromReal(double x, int *warn)
{
if (ISNAN(x))
return NA_INTEGER;
else if (x >= INT_MAX+1. || x <= INT_MIN ) {
*warn |= WARN_INT_NA;
return NA_INTEGER;
}
return (int) x;
}
The cast to int will also be an int overflow situation if x is > INT_MAX
and < INT_MAX+1 so the risk is small! There are other instances of this
situation in IntegerFromComplex() and IntegerFromString().
More below...
>
> Duncan Murdoch
>
>
>>
>> Coercion from complex to raw has the same problem:
>>
>> > as.raw(c(3e9+0i, 5.1))
>> [1] 00 05
>> Warning messages:
>> 1: NAs introduced by coercion to integer range
>> 2: out-of-range values treated as 0 in coercion to raw
>>
>> Current implementation (file src/main/coerce.c, lines 711-721):
>>
>> case CPLXSXP:
>> for (i = 0; i < n; i++) {
>> // if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
>> tmp = IntegerFromComplex(COMPLEX_ELT(v, i), &warn);
>> if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) {
>> tmp = 0;
>> warn |= WARN_RAW;
>> }
>> pa[i] = (Rbyte) tmp;
>> }
>> break;
>>
>> This implementation has the following additional problem when the
>> supplied complex has a nonzero imaginary part:
>>
>> > as.raw(300+4i)
>> [1] 00
>> Warning messages:
>> 1: imaginary parts discarded in coercion
>> 2: out-of-range values treated as 0 in coercion to raw
>>
>> > as.raw(3e9+4i)
>> [1] 00
>> Warning messages:
>> 1: NAs introduced by coercion to integer range
>> 2: out-of-range values treated as 0 in coercion to raw
>>
>> In one case we get a warning about the discarding of the imaginary part
>> but not the other case, which is unexpected. We should see the exact
>> same warning (or warnings) in both cases.
>>
>> With the following fix we only get the warning about the discarding of
>> the imaginary part if we are not in a "out-of-range values treated as 0
>> in coercion to raw" situation:
>>
>> case CPLXSXP:
>> for (i = 0; i < n; i++) {
>> // if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
>> Rcomplex vi = COMPLEX_ELT(v, i);
>> if(ISNAN(vi.r) || ISNAN(vi.i) || (tmp = (int) vi.r) < 0 ||
>> tmp > 255) {
>> tmp = 0;
>> warn |= WARN_RAW;
>> } else {
>> if(vi.i != 0.0)
>> warn |= WARN_IMAG;
>> }
>> pa[i] = (Rbyte) tmp;
>> }
>> break;
Corrected version:
if(ISNAN(vi.r) || ISNAN(vi.i) || vi.r <= -1.00 ||
vi.r >= 256.00) {
tmp = 0;
warn |= WARN_RAW;
} else {
tmp = (int) vi.r;
if(vi.i != 0.0)
warn |= WARN_IMAG;
}
pa[i] = (Rbyte) tmp;
Hopefully this time I got it right.
Best,
H.
>>
>> Finally, coercion from character to raw has the same problem and its
>> code can be fixed in a similar manner:
>>
>> > as.raw(c("3e9", 5.1))
>> [1] 00 05
>> Warning messages:
>> 1: NAs introduced by coercion to integer range
>> 2: out-of-range values treated as 0 in coercion to raw
>>
>> Cheers,
>> H.
>>
>>
>
--
Hervé Pagès
Bioconductor Core Team
hpages.on.github using gmail.com
More information about the R-devel
mailing list