[Rd] boolean and logical types -draft

Prof Brian Ripley r|p|eybd @end|ng |rom |c|oud@com
Mon Feb 3 18:36:28 CET 2025


Tomas,

I am thinking of writing something for R-devel, and hope to have your 
input first.

I get moderated on R-devel as I am now subscribed as 
brian.ripley using R-project.org which of course I cannot send from. So I am 
even more discouraged from posting there.  (R-core is bad enough with 
Luke discouraging all innovation except by him and Simon completely 
misunderstanding the C23 status.)

Thanks,

Brian

----------------

There are several of these, and few guarantees for inter-working.

a) R's logical vectors, which include a value NA for its elements.
b) R's Rboolean type in C/C++

c) C++'s bool type
d) C23's bool type
e) C99's _Bool type to which bool is aliased if <stdbool.h> is included.
f) Fortran's LOGICAL type

a) is currently implemented as a C int (so 32-bit) type with NA as the C 
value NA_LOGICAL which is the same a NA_INTEGER.

b) is currently implemented as a C enum with two values.  I don't know 
of any guarantees on how that is stored except in char or an integer 
type -- however it seems common practice to use a 32-bit type (int or 
unsigned int would not be distinguishable).  (C23 §6.7.3.3)  Enums can 
have a specified data type, but we do not.

C23 states that bool has 1 value bit and some padding bits (§6.2.6.2) so 
it can be stored in char-sized storage (i.e. bytes) or multiples 
thereof.  And that _Bool is a alternative name for bool.

f) is complier-dependent: for interoperability with C or R, code should 
use c_bool from iso_c_binding (Fortran 2003).  Fortran compilers store 
LOGICAL in compiler-dependent ways, and for a long time we got away with 
assuming that was equivalent to int (so LOGICAL values could be passed 
to and from with int* on the C/R side).  But sometime around GCC 8 they 
changed to int_least32_t, which on common platforms is the same as int 
but does not need to be.

It seems that in all cases coercion to an integer type coerces false 
values to 0 and true values to 1 (and this is guaranteed by C23 at 
least).  And C23 guarantees that when coercing from an integer type to 
bool zero values are coerced to false and non-zero ones to true (bool is 
'an unsigned integer type').  However, that does not seem to be true for 
C++ as UB sanitizers warn on coercing values other than 0/1.

I believe it to be the intention that c), d) and e) have the same 
representation and interwork using the same compiler, but I could not 
find that documented and see signs that e) might differ in C17 and C23 
modes.

----------------

I need to look again at the C and C++ standards which with my vision I 
need to do in very small chunks.  Oh for the vision I once had!

-- 
Brian D. Ripley,                  ripley using stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford



More information about the R-devel mailing list