[R] Fast reading of hex data?

Fang zhou.zfang at gmail.com
Tue May 8 11:44:03 CEST 2012


Hi all,

Basically, I have data in the format of (up to 1 gig in size) text files
containing stuff like:

F34060F81000F28055F8A000F2E05EF8F000F34 (...)

The data is basically strings denoting hex values (9 = 9, A = 10, B = 11,
...) organised in fixed, small blocks. What I want to do is to read in a
specified segment of the string, break it up into blocks, and convert it
into a vector of integers for further processing. And I want to do this
fast, and hopefully without using masses of memory. So, I'm wondering if
anyone has any better ideas than what I'm doing - well, anything that would
make a sizable difference anyway.

Right now, my methodology is the following:

Use mmap (from library mmap) to map the file to a memory mapped variable,
reading in each byte as uint8 integer.
obj <- mmap("file.txt", mode = uint8())
tmp <- obj[bytepos]
Converting the integer representations of each byte into the appropriate
integer by 
tmp <- tmp - 48 - 7*(tmp>64)
Collating blocksize values together by
tmp<- matrix(tmp, ncol = blocksize, byrow = T) %*% 16^(blocksize: 1 - 1)

Now, my question is, is there a better way? My attempts with rawToChar and
strtoi seems to take drastically longer for reasonably lengthy bytepos,
presumeably because of string manipulations/storage, but possibly I am doing
it wrong somehow. If there is no better way in R, would there be much value
in implementing this in C, for example, or would the computational
improvement be small?

Thanks,

Zhou

--
View this message in context: http://r.789695.n4.nabble.com/Fast-reading-of-hex-data-tp4617024.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list