[R] seek(), skip by bits (not by bytes) in binary file

Tue Jun 19 18:10:00 CEST 2012

I am not sure why reading through 'bit-by-bit' gets you to where you
want to be.  I assume that the file has some structure, even though it
may be changing daily.  You mentioned the various types of data that
it might contain; are they all in 'byte' sized chucks?  If you really
have data that begins in the middle of a byte and then extends over
several bytes, you will have to write some functions that will pull
out this data and then reconstruct it into an object (e.g., integer,
numeric, ...) that R understands.  Can you provide some more
definition of what the data actually looks like and how you would find
the "pattern" of the data.  Almost all systems read at the lowest
level byte sized chucks, and if you really have to get down to the bit
level to reconstruct the data, then you have to write the unpack/pack
functions.  This can all be done once you understand the structure of
the data.  So some examples would be useful if you want someone to
propose a solution.

On Tue, Jun 19, 2012 at 11:54 AM, Ben quant <ccquant at gmail.com> wrote:
> Hello,
>
> Has a function been built that will skip to a certain bit in a binary file?
>
> As of 2009 the answer was 'no':
> http://r.789695.n4.nabble.com/read-binary-file-seek-td900847.html
> https://stat.ethz.ch/pipermail/r-help/2009-May/199819.html
>
> If you feel I don't need to (like in the links above), please provide some
> help. (Note this is my first time working with binary files.)
>
> I'm still working on the script, but here is where I am right now. The for
> loop is being used because:
>
> 1) I have to get down to correct position then get the info I want/need.
> The stuff I am reading through (x) is not fully understood and it is a mix
> of various chars, floats, integers, etc. of various sizes etc. so I don't
> know who many bytes to read in unless I read them bit by bit. (The
> information and structure of the information changes daily so I'm skipping
> over it.)
> 2) If I skip all in one readBin() my 'n' value is often up to 20 times too
> big (I get an error) and/or R won't let me "allocate a vector of size...."
> etc. So I split it up into chunks (divide by 20 etc.) and read each chuck
> then trash each part that is readBin()'d. Then the last line I get the data
> that I want (data1).
>
> Here is my working code:
>
> # I have to read 'junk' bits from the to.read file which is huge integer so
> I divide it up and loop through to.read in parts (jb_part).
>  divr = 20
>  mod = junk %% divr
>
>  jb_part = as.integer(junk/divr)
>  jb_part_mod = jb_part + mod # catch the remainder/modulus
>
>  to.read = file(paste(dbs_path,"/",dbs_file,sep=""),"rb") # connect to the
> binary file
> # loop in chunks to where I want to be
>  for(i in 1:(divr-1)){
>    x = readBin(to.read,"raw",n=jb_part,size=1)
>    x = NULL # trash the result b/c I don't want it
>  }
> # read a a little more to include the remainder/modulus bits left over by
> dividing by 20 above
>  x = readBin(to.read,'raw',n=jb_part_mod,size=1)
>  x = NULL # trash it
>
> # finally get the data that I want
> data1 = readBin(to.read,double(),n=some_number,size=size_to_use)
>
> This works, but it is SLOW!  Any ideas on how to get down to the correct
> bit a bit quicker (pun intended). :)
>
> Thanks for any help!
>
> Ben
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.