[Rd] Spurious warnings in coercion from double/complex/character to raw

Hervé Pagès hp@ge@@on@g|thub @end|ng |rom gm@||@com
Fri Sep 10 21:13:30 CEST 2021



On 10/09/2021 09:12, Duncan Murdoch wrote:
> On 10/09/2021 11:29 a.m., Hervé Pagès wrote:
>> Hi,
>>
>> The first warning below is unexpected and confusing:
>>
>>     > as.raw(c(3e9, 5.1))
>>     [1] 00 05
>>     Warning messages:
>>     1: NAs introduced by coercion to integer range
>>     2: out-of-range values treated as 0 in coercion to raw
>>
>> The reason we get it is that coercion from numeric to raw is currently
>> implemented on top of coercion from numeric to int (file
>> src/main/coerce.c, lines 700-710):
>>
>>       case REALSXP:
>>           for (i = 0; i < n; i++) {
>> //          if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
>>               tmp = IntegerFromReal(REAL_ELT(v, i), &warn);
>>               if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) {
>>                   tmp = 0;
>>                   warn |= WARN_RAW;
>>               }
>>               pa[i] = (Rbyte) tmp;
>>           }
>>           break;
>>
>> The first warning comes from the call to IntegerFromReal().
>>
>> The following code avoids the spurious warning and is also simpler and
>> slightly faster:
>>
>>       case REALSXP:
>>           for (i = 0; i < n; i++) {
>> //          if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
>>               double vi = REAL_ELT(v, i);
>>               if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) {
>>                   tmp = 0;
>>                   warn |= WARN_RAW;
>>               }
>>               pa[i] = (Rbyte) tmp;
>>           }
>>           break;
> 
> Doesn't that give different results in case vi is so large that "(int) 
> vi" overflows?  (I don't know what the C standard says, but some online 
> references say that behaviour is implementation dependent.)
> 
> For example, if
> 
>    vi = 1.0 +  INT_MAX;
> 
> wouldn't "(int) vi" be equal to a small integer?

Good catch, thanks!

Replacing

     if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) {
         tmp = 0;

         warn |= WARN_RAW;

     }
     pa[i] = (Rbyte) tmp;

with

     if(ISNAN(vi) || vi <= -1.0 || vi >= 256.0)
  {
         tmp = 0;

         warn |= WARN_RAW;

     } else {
         tmp = (int) vi;
     }
     pa[i] = (Rbyte) tmp;

should address that.

FWIW IntegerFromReal() has a similar risk of int overflow 
(src/main/coerce.c, lines 128-138):

   int attribute_hidden

   IntegerFromReal(double x, int *warn)

   {

       if (ISNAN(x))

           return NA_INTEGER;

       else if (x >= INT_MAX+1. || x <= INT_MIN ) {

           *warn |= WARN_INT_NA;

           return NA_INTEGER;

       }

       return (int) x;

   }



The cast to int will also be an int overflow situation if x is > INT_MAX 
and < INT_MAX+1 so the risk is small! There are other instances of this 
situation in IntegerFromComplex() and IntegerFromString().

More below...

> 
> Duncan Murdoch
> 
> 
>>
>> Coercion from complex to raw has the same problem:
>>
>>     > as.raw(c(3e9+0i, 5.1))
>>     [1] 00 05
>>     Warning messages:
>>     1: NAs introduced by coercion to integer range
>>     2: out-of-range values treated as 0 in coercion to raw
>>
>> Current implementation (file src/main/coerce.c, lines 711-721):
>>
>>       case CPLXSXP:
>>           for (i = 0; i < n; i++) {
>> //          if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
>>               tmp = IntegerFromComplex(COMPLEX_ELT(v, i), &warn);
>>               if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) {
>>                   tmp = 0;
>>                   warn |= WARN_RAW;
>>               }
>>               pa[i] = (Rbyte) tmp;
>>           }
>>           break;
>>
>> This implementation has the following additional problem when the
>> supplied complex has a nonzero imaginary part:
>>
>>     > as.raw(300+4i)
>>     [1] 00
>>     Warning messages:
>>     1: imaginary parts discarded in coercion
>>     2: out-of-range values treated as 0 in coercion to raw
>>
>>     > as.raw(3e9+4i)
>>     [1] 00
>>     Warning messages:
>>     1: NAs introduced by coercion to integer range
>>     2: out-of-range values treated as 0 in coercion to raw
>>
>> In one case we get a warning about the discarding of the imaginary part
>> but not the other case, which is unexpected. We should see the exact
>> same warning (or warnings) in both cases.
>>
>> With the following fix we only get the warning about the discarding of
>> the imaginary part if we are not in a "out-of-range values treated as 0
>> in coercion to raw" situation:
>>
>>       case CPLXSXP:
>>           for (i = 0; i < n; i++) {
>> //          if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
>>               Rcomplex vi = COMPLEX_ELT(v, i);
>>               if(ISNAN(vi.r) || ISNAN(vi.i) || (tmp = (int) vi.r) < 0 ||
>> tmp > 255) {
>>                   tmp = 0;
>>                   warn |= WARN_RAW;
>>               } else {
>>                   if(vi.i != 0.0)
>>                       warn |= WARN_IMAG;
>>               }
>>               pa[i] = (Rbyte) tmp;
>>           }
>>           break;

Corrected version:

     if(ISNAN(vi.r) || ISNAN(vi.i) || vi.r <= -1.00 ||
  vi.r >= 256.00) {

         tmp = 0;

         warn |= WARN_RAW;

     } else {

         tmp = (int) vi.r;
         if(vi.i != 0.0)

             warn |= WARN_IMAG;

     }

     pa[i] = (Rbyte) tmp;

Hopefully this time I got it right.

Best,
H.

>>
>> Finally, coercion from character to raw has the same problem and its
>> code can be fixed in a similar manner:
>>
>>     > as.raw(c("3e9", 5.1))
>>     [1] 00 05
>>     Warning messages:
>>     1: NAs introduced by coercion to integer range
>>     2: out-of-range values treated as 0 in coercion to raw
>>
>> Cheers,
>> H.
>>
>>
> 

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.github using gmail.com



More information about the R-devel mailing list