[Rd] Reading 64-bit integers

Wed Mar 30 19:38:06 CEST 2011

> -----Original Message-----
> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Simon Urbanek
> Sent: Tuesday, March 29, 2011 6:49 PM
> To: Duncan Murdoch
> Cc: r-devel at r-project.org
> Subject: Re: [Rd] Reading 64-bit integers
> 
> 
> On Mar 29, 2011, at 8:47 PM, Duncan Murdoch wrote:
> 
> > On 29/03/2011 7:01 PM, Jon Clayden wrote:
> >> Dear Simon,
> >> 
> >> On 29 March 2011 22:40, Simon 
> Urbanek<simon.urbanek at r-project.org>  wrote:
> >>> Jon,
> >>> 
> >>> On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote:
> >>> 
> >>>> Dear Simon,
> >>>> 
> >>>> Thank you for the response.
> >>>> 
> >>>> On 29 March 2011 15:06, Simon 
> Urbanek<simon.urbanek at r-project.org>  wrote:
> >>>>> 
> >>>>> On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote:
> >>>>> 
> >>>>>> Dear all,
> >>>>>> 
> >>>>>> I see from some previous threads that support for 
> 64-bit integers in R
> >>>>>> may be an aim for future versions, but in the meantime 
> I'm wondering
> >>>>>> whether it is possible to read in integers of greater 
> than 32 bits at
> >>>>>> all. Judging from ?readBin, it should be possible to 
> read 8-byte
> >>>>>> integers to some degree, but it is clearly limited in 
> practice by R's
> >>>>>> internally 32-bit integer type:
> >>>>>> 
> >>>>>>> x<- as.raw(c(0,0,0,0,1,0,0,0))
> >>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
> >>>>>> [1] 16777216
> >>>>>>> x<- as.raw(c(0,0,0,1,0,0,0,0))
> >>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
> >>>>>> [1] 0
> >>>>>> 
> >>>>>> For values that fit into 32 bits it works fine, but 
> for larger values
> >>>>>> it fails. (I'm a bit surprised by the zero - should 
> the value not be
> >>>>>> NA if it is out of range?
> >>>>> 
> >>>>> No, it's not out of range - int is only 4 bytes so only 
> 4 first bytes (respecting endianness order, hence LSB) are used.
> >>>> 
> >>>> The fact remains that I ask for the value of an 8-byte 
> integer and
> >>>> don't get it.
> >>> 
> >>> I think you're misinterpreting the documentation:
> >>> 
> >>>     If 'size' is specified and not the natural size of the object,
> >>>     each element of the vector is coerced to an appropriate type
> >>>     before being written or as it is read.
> >>> 
> >>> The "integer" object type is defined as signed 32-bit in 
> R, so if you ask for "8 bytes into object type integer", you 
> get a coercion into that object type -- 32-bit signed integer 
> -- as documented. I think the issue may come from the 
> confusion of the object type "integer" with general "integer 
> number" in mathematical sense that has no representation 
> restrictions. (FWIW in C the "integer" type is "int" and it 
> is 32-bit on all modern OSes regardless of platform - that's 
> where the limitation comes from, it's not something R has made up).
> >> 
> >> OK, but it still seems like there is a case for raising a 
> warning. As
> >> it is there is no way to tell when reading an 8-byte integer from a
> >> file whether its value is really 0, or if it merely has 0 in its
> >> least-significant 4 bytes. If 99% of such stored numbers are below
> >> 2^31, one is going to need some extra logic to catch the other 1%
> >> where you (silently) get the wrong value. In essence, unless you're
> >> certain that you will never come across a number that actually uses
> >> the upper 4 bytes, you will always have to read it as two 4-byte
> >> numbers and check that the high-order one (which is endianness
> >> dependent, of course) is zero. A C-level sanity check seems more
> >> efficient and more helpful to me.
> > 
> > Seems to me that the S-PLUS solution (output="double") 
> would be a lot more useful.  I'd commit that if you write it; 
> I don't think I'd commit the warning.
> > 
> 
> I was going to write some thing similar (idea = good, patch 
> welcome ;)). My only worry is that the "output" argument is a 
> bit misleading in that one could expect to use any 
> combination of "input"/"output" which may be a maintenance 
> nightmare. If I understand it correctly it's only a special 
> case for integer input. I don't have S+ so can't say how they 
> deal with that.

In S+'s readBin the output argument can be
only double() or single() when what is double()
or single() (S+ still  has a real single
precision storage mode) and can be any
numeric type or logical when what is integer().

The output=double() seemed like the only useful case.

It does not warn when precision is lost in the 8-byte
integer to double conversion.  Perhaps it should.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> 
> Cheers,
> Simon
> 
> 
> > 
> >> 
> >>>> Pretending that it's really only four bytes because of
> >>>> the limits of R's integer type isn't all that helpful. Perhaps a
> >>>> warning should be put out if the cast will affect the 
> value of the
> >>>> result? It looks like the relevant lines in 
> src/main/connections.c are
> >>>> 3689-3697 in the current alpha:
> >>>> 
> >>>> #if SIZEOF_LONG == 8
> >>>>                   case sizeof(long):
> >>>>                       INTEGER(ans)[i] = (int)*((long *)buf);
> >>>>                       break;
> >>>> #elif SIZEOF_LONG_LONG == 8
> >>>>                   case sizeof(_lli_t):
> >>>>                       INTEGER(ans)[i] = (int)*((_lli_t *)buf);
> >>>>                       break;
> >>>> #endif
> >>>> 
> >>>>>> ) The value can be represented as a double,
> >>>>>> though:
> >>>>>> 
> >>>>>>> 4294967296
> >>>>>> [1] 4294967296
> >>>>>> 
> >>>>>> I wouldn't expect readBin() to return a double if an 
> integer was
> >>>>>> requested, but is there any way to get the correct 
> value out of it?
> >>>>> 
> >>>>> Trivially (for your unsigned big-endian case):
> >>>>> 
> >>>>> y<- readBin(x, "integer", n=length(x)/4L, endian="big")
> >>>>> y<- ifelse(y<  0, 2^32 + y, y)
> >>>>> i<- seq(1,length(y),2)
> >>>>> y<- y[i] * 2^32 + y[i + 1L]
> >>>> 
> >>>> Thanks for the code, but I'm not sure I would call that trivial,
> >>>> especially if one needs to cater for little endian and 
> signed cases as
> >>>> well!
> >>> 
> >>> I was saying for your case and it's trivial as in read as 
> integers, convert to double precision and add.
> >>> 
> >>> 
> >>>> This is what I meant by reconstructing the number manually...
> >>>> 
> >>> 
> >>> You didn't say so - you were talking about reconstructing 
> it from a raw vector which seems a lot more painful since you 
> can't compute with enough precision on raw vectors.
> >> 
> >> True - I should have been more specific. Sorry.
> >> 
> >> Jon
> >> 
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> > 
> > 
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>