[R] seek(), skip by bits (not by bytes) in binary file

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Tue Jun 19 19:07:23 CEST 2012


If the structure really changes day by day, then you have to decipher how it is constructed in order to find the correct bit to go to. 

If you think you already know which bit to go to, then the way you know is "the 3rd bit of the 71st byte", which means that the existing seek function should be sufficient to get that byte and pick apart the bits to get the ones you want.

I recommend using the hexBin package for this kind of task.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.



Ben quant <ccquant at gmail.com> wrote:

>Other people at my firm who know a lot about binary files couldn't
>figure
>out the parts of the file that I am skipping over. Part of the issue is
>that there are several different files (dbs extension files) like this
>that
>I have to process and the structures do change depending on the source
>of
>these files.
>
>In short, the problem is over my head and I was hoping to go right to
>the
>correct bit and read, which would make things much easier. I guess
>not...
>Thanks for your help though.
>
>Anyone else?
>
>thanks,
>
>ben
>
>On Tue, Jun 19, 2012 at 10:10 AM, jim holtman <jholtman at gmail.com>
>wrote:
>
>> I am not sure why reading through 'bit-by-bit' gets you to where you
>> want to be.  I assume that the file has some structure, even though
>it
>> may be changing daily.  You mentioned the various types of data that
>> it might contain; are they all in 'byte' sized chucks?  If you really
>> have data that begins in the middle of a byte and then extends over
>> several bytes, you will have to write some functions that will pull
>> out this data and then reconstruct it into an object (e.g., integer,
>> numeric, ...) that R understands.  Can you provide some more
>> definition of what the data actually looks like and how you would
>find
>> the "pattern" of the data.  Almost all systems read at the lowest
>> level byte sized chucks, and if you really have to get down to the
>bit
>> level to reconstruct the data, then you have to write the unpack/pack
>> functions.  This can all be done once you understand the structure of
>> the data.  So some examples would be useful if you want someone to
>> propose a solution.
>>
>> On Tue, Jun 19, 2012 at 11:54 AM, Ben quant <ccquant at gmail.com>
>wrote:
>> > Hello,
>> >
>> > Has a function been built that will skip to a certain bit in a
>binary
>> file?
>> >
>> > As of 2009 the answer was 'no':
>> > http://r.789695.n4.nabble.com/read-binary-file-seek-td900847.html
>> > https://stat.ethz.ch/pipermail/r-help/2009-May/199819.html
>> >
>> > If you feel I don't need to (like in the links above), please
>provide
>> some
>> > help. (Note this is my first time working with binary files.)
>> >
>> > I'm still working on the script, but here is where I am right now.
>The
>> for
>> > loop is being used because:
>> >
>> > 1) I have to get down to correct position then get the info I
>want/need.
>> > The stuff I am reading through (x) is not fully understood and it
>is a
>> mix
>> > of various chars, floats, integers, etc. of various sizes etc. so I
>don't
>> > know who many bytes to read in unless I read them bit by bit. (The
>> > information and structure of the information changes daily so I'm
>> skipping
>> > over it.)
>> > 2) If I skip all in one readBin() my 'n' value is often up to 20
>times
>> too
>> > big (I get an error) and/or R won't let me "allocate a vector of
>> size...."
>> > etc. So I split it up into chunks (divide by 20 etc.) and read each
>chuck
>> > then trash each part that is readBin()'d. Then the last line I get
>the
>> data
>> > that I want (data1).
>> >
>> > Here is my working code:
>> >
>> > # I have to read 'junk' bits from the to.read file which is huge
>integer
>> so
>> > I divide it up and loop through to.read in parts (jb_part).
>> >  divr = 20
>> >  mod = junk %% divr
>> >
>> >  jb_part = as.integer(junk/divr)
>> >  jb_part_mod = jb_part + mod # catch the remainder/modulus
>> >
>> >  to.read = file(paste(dbs_path,"/",dbs_file,sep=""),"rb") # connect
>to
>> the
>> > binary file
>> > # loop in chunks to where I want to be
>> >  for(i in 1:(divr-1)){
>> >    x = readBin(to.read,"raw",n=jb_part,size=1)
>> >    x = NULL # trash the result b/c I don't want it
>> >  }
>> > # read a a little more to include the remainder/modulus bits left
>over by
>> > dividing by 20 above
>> >  x = readBin(to.read,'raw',n=jb_part_mod,size=1)
>> >  x = NULL # trash it
>> >
>> > # finally get the data that I want
>> > data1 = readBin(to.read,double(),n=some_number,size=size_to_use)
>> >
>> > This works, but it is SLOW!  Any ideas on how to get down to the
>correct
>> > bit a bit quicker (pun intended). :)
>> >
>> > Thanks for any help!
>> >
>> > Ben
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list