[R] readBin fails to read large files

Matt Shotwell matt at biostatmatt.com
Thu Sep 1 19:36:25 CEST 2011


On Thu, 2011-09-01 at 17:36 +0100, Prof Brian Ripley wrote:
> readBin is intended to read a few items at a time, not 10^9.  You are 
> probably getting 32-bit integer overflow inside your OS, since the 
> number of bytes you are trying to read in one go exceeds 2GB.
> 
> Don't do that: read say a million at time.
> 
> And BTW, if these really are unsigned ints you will get wraparound.

To elaborate, ?readBin reads that the 'signed' argument is only used for
integers of size 1 and 2 bytes. These are ultimately converted to signed
4 byte integers, because that's how R stores integers. To be exact, if
your file contains integers larger than 2^31-1 = 2147483647, would
occur. In actuality, R returns NA for those values.

I'm bringing this up because R normally issues a warning:

R> 2147483647L + 1L
[1] NA
Warning message:
In 2147483647L + 1L : NAs produced by integer overflow

But, a similar warning isn't issued by readBin when NA results from
signed integer overflow:

#The raw vector below represents 2147483647L and 2147483647L + 1L
#in little endian, unsigned, 4 byte integers 
R> dat <- as.raw(c(0xff,0xff,0xff,0x7f,0x00,0x00,0x00,0x80))
R> writeBin(dat, 'test.bin')
R> readBin('test.bin', n=2, integer(), signed=FALSE)
[1] 2147483647         NA

> On Thu, 1 Sep 2011, Benton, Paul wrote:
> 
> > Posting for a friend
> >
> > Begin forwarded message:
> >
> > From: "Geier, Florian" <florian.geier08 at imperial.ac.uk<mailto:florian.geier08 at imperial.ac.uk>>
> > Subject: Fwd: readBin fails to read large files
> > Date: September 1, 2011 4:10:53 PM GMT+01:00
> > To:
> >
> >
> >
> > Begin forwarded message:
> >
> > Date: 1 September 2011 16:01:45 GMT+01:00
> > Subject: readBin fails to read large files
> >
> > Dear all,
> >
> > I am trying to read a large file (~2GB) of unsigned ints into R. Using the command:
> >
> > raw<-readBin("file",n=10^8, integer(),endian="little",signed=FALSE)
> >
> > It works fine for n=10^8, but fails for n=10^9 (or even at n=6*10^8). My machine$sizeof.long is 8 bit.
> > I am running R 2.13.1 on a x86_64-apple-darwin9.8.0/x86_64 (64-bit) architecture.
> >
> > Thanks for your help
> >
> > Florian
> >
> > --
> > AXA doctoral fellow
> > Bundy lab - Biomolecular Medicine
> > Imperial College London
> >
> >
> >
> >
> >
> > --
> > AXA doctoral fellow
> > Bundy lab - Biomolecular Medicine
> > Imperial College London
> >
> >
> >
> >
> >
> >
> > 	[[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>



More information about the R-help mailing list