[Rd] Reading 64-bit integers

Simon Urbanek simon.urbanek at r-project.org
Wed Mar 30 03:49:28 CEST 2011


On Mar 29, 2011, at 8:47 PM, Duncan Murdoch wrote:

> On 29/03/2011 7:01 PM, Jon Clayden wrote:
>> Dear Simon,
>> 
>> On 29 March 2011 22:40, Simon Urbanek<simon.urbanek at r-project.org>  wrote:
>>> Jon,
>>> 
>>> On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote:
>>> 
>>>> Dear Simon,
>>>> 
>>>> Thank you for the response.
>>>> 
>>>> On 29 March 2011 15:06, Simon Urbanek<simon.urbanek at r-project.org>  wrote:
>>>>> 
>>>>> On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote:
>>>>> 
>>>>>> Dear all,
>>>>>> 
>>>>>> I see from some previous threads that support for 64-bit integers in R
>>>>>> may be an aim for future versions, but in the meantime I'm wondering
>>>>>> whether it is possible to read in integers of greater than 32 bits at
>>>>>> all. Judging from ?readBin, it should be possible to read 8-byte
>>>>>> integers to some degree, but it is clearly limited in practice by R's
>>>>>> internally 32-bit integer type:
>>>>>> 
>>>>>>> x<- as.raw(c(0,0,0,0,1,0,0,0))
>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
>>>>>> [1] 16777216
>>>>>>> x<- as.raw(c(0,0,0,1,0,0,0,0))
>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
>>>>>> [1] 0
>>>>>> 
>>>>>> For values that fit into 32 bits it works fine, but for larger values
>>>>>> it fails. (I'm a bit surprised by the zero - should the value not be
>>>>>> NA if it is out of range?
>>>>> 
>>>>> No, it's not out of range - int is only 4 bytes so only 4 first bytes (respecting endianness order, hence LSB) are used.
>>>> 
>>>> The fact remains that I ask for the value of an 8-byte integer and
>>>> don't get it.
>>> 
>>> I think you're misinterpreting the documentation:
>>> 
>>>     If ‘size’ is specified and not the natural size of the object,
>>>     each element of the vector is coerced to an appropriate type
>>>     before being written or as it is read.
>>> 
>>> The "integer" object type is defined as signed 32-bit in R, so if you ask for "8 bytes into object type integer", you get a coercion into that object type -- 32-bit signed integer -- as documented. I think the issue may come from the confusion of the object type "integer" with general "integer number" in mathematical sense that has no representation restrictions. (FWIW in C the "integer" type is "int" and it is 32-bit on all modern OSes regardless of platform - that's where the limitation comes from, it's not something R has made up).
>> 
>> OK, but it still seems like there is a case for raising a warning. As
>> it is there is no way to tell when reading an 8-byte integer from a
>> file whether its value is really 0, or if it merely has 0 in its
>> least-significant 4 bytes. If 99% of such stored numbers are below
>> 2^31, one is going to need some extra logic to catch the other 1%
>> where you (silently) get the wrong value. In essence, unless you're
>> certain that you will never come across a number that actually uses
>> the upper 4 bytes, you will always have to read it as two 4-byte
>> numbers and check that the high-order one (which is endianness
>> dependent, of course) is zero. A C-level sanity check seems more
>> efficient and more helpful to me.
> 
> Seems to me that the S-PLUS solution (output="double") would be a lot more useful.  I'd commit that if you write it; I don't think I'd commit the warning.
> 

I was going to write some thing similar (idea = good, patch welcome ;)). My only worry is that the "output" argument is a bit misleading in that one could expect to use any combination of "input"/"output" which may be a maintenance nightmare. If I understand it correctly it's only a special case for integer input. I don't have S+ so can't say how they deal with that.

Cheers,
Simon


> 
>> 
>>>> Pretending that it's really only four bytes because of
>>>> the limits of R's integer type isn't all that helpful. Perhaps a
>>>> warning should be put out if the cast will affect the value of the
>>>> result? It looks like the relevant lines in src/main/connections.c are
>>>> 3689-3697 in the current alpha:
>>>> 
>>>> #if SIZEOF_LONG == 8
>>>>                   case sizeof(long):
>>>>                       INTEGER(ans)[i] = (int)*((long *)buf);
>>>>                       break;
>>>> #elif SIZEOF_LONG_LONG == 8
>>>>                   case sizeof(_lli_t):
>>>>                       INTEGER(ans)[i] = (int)*((_lli_t *)buf);
>>>>                       break;
>>>> #endif
>>>> 
>>>>>> ) The value can be represented as a double,
>>>>>> though:
>>>>>> 
>>>>>>> 4294967296
>>>>>> [1] 4294967296
>>>>>> 
>>>>>> I wouldn't expect readBin() to return a double if an integer was
>>>>>> requested, but is there any way to get the correct value out of it?
>>>>> 
>>>>> Trivially (for your unsigned big-endian case):
>>>>> 
>>>>> y<- readBin(x, "integer", n=length(x)/4L, endian="big")
>>>>> y<- ifelse(y<  0, 2^32 + y, y)
>>>>> i<- seq(1,length(y),2)
>>>>> y<- y[i] * 2^32 + y[i + 1L]
>>>> 
>>>> Thanks for the code, but I'm not sure I would call that trivial,
>>>> especially if one needs to cater for little endian and signed cases as
>>>> well!
>>> 
>>> I was saying for your case and it's trivial as in read as integers, convert to double precision and add.
>>> 
>>> 
>>>> This is what I meant by reconstructing the number manually...
>>>> 
>>> 
>>> You didn't say so - you were talking about reconstructing it from a raw vector which seems a lot more painful since you can't compute with enough precision on raw vectors.
>> 
>> True - I should have been more specific. Sorry.
>> 
>> Jon
>> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 



More information about the R-devel mailing list