[R] More help with Binary Files

Steve_Friedman at nps.gov Steve_Friedman at nps.gov
Wed Feb 11 21:25:52 CET 2009



Does anyone else have any insights to this issue:

Henrick, thank you for your very quick response.  I've examined the readBin
help file with respect to endian and I'm still not sure I'm getting this
correct.

Here is what I'm coding:

con <- file(file.choose(), open="rb")
Year66 <- readBin(con, what=integer(), signed = TRUE, size = 2,
endian="little",  n = 40374840)    # define endian= "little"
 length(Year66)
     close(con)

# convert millimeters to inches
   Year66.in <- Year66 * 0.039370

 describe(Year66.in)
Year66.in
      n missing  unique    Mean     .05     .10     .25     .50     .75
.90     .95
8185584       0   65511  -21.56  -650.1  -650.1  -162.2     0.0     0.0
636.5   639.1

lowest : -1290 -1290 -1290 -1290 -1290, highest:  1290  1290  1290  1290
1290

# establish cut points using inches
  bins <- cut(Year66.in, breaks=30)
  barplot(table(bins))

length(Year66.in)  # this returns a value representing the number of
records read as 8185584 or 20.2% (see next line)  of the records that I'm
expecting.
length(Year66.in) / (419*264*365)  # returns proportion of records expected
in one year

####  here I will introduce code to classify the summary statistics using
both a clustering and a non-metric scaling function.  These procedures will
hopefully enable differentiation of ####  cluster-groupings, associating
the initial input annual year values with a separate (not-shown) calculated
index.


What I eventually want to accomplish is a statistical summary for each of
the 37 years in the binary file.  Reading in the file on a year to year
basis (n=40374840) should give me the all of the records for just the first
year, not all of the records in the binary file.  I also therefore need to
better understand how to read a set of records for year 2, 3, 4, ... 37.

Any ideas ?
Thanks for your assistance

Steve

Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

Steve_Friedman at nps.gov
Office (305) 224 - 4282
Fax     (305) 224 - 4147


                                                                           
             Henrik Bengtsson                                              
             <hb at stat.berkeley                                             
             .edu>                                                      To 
             Sent by:                  Steve_Friedman at nps.gov              
             henrik.bengtsson@                                          cc 
             gmail.com                 r-help at r-project.org                
                                                                   Subject 
                                       Re: [R] Reading Binary Files        
             02/11/2009 09:20                                              
             AM PST                                                        
                                                                           
                                                                           
                                                                           
                                                                           




Argument 'size' is what you are looking for, cf. help(readBin).
Whenever reading binary files this way, I strongly recommend that you
are explicit about all arguments of readBin(), e.g.

readBin(con, what=integer(), size=2, signed=TRUE, endian="little", n=n);

For instance, you probably do not want 'endian' to be dependent on the
platform (see help) you run on, but instead be specific to the file
format you are reading.

/Henrik

On Wed, Feb 11, 2009 at 8:04 AM,  <Steve_Friedman at nps.gov> wrote:
>
> Hello
>
> I'm encountering some difficulty correctly reading binary files. The
binary
> files store data as "short"  rather than "double" , "int", or any of the
> other  modes of the vector being read.
>
> The data represents a regular grid of size 419 rows by 264 columns, to
make
> it more interesting, the data are daily records, for a total of 37 years.
> The file size is therefore 419(rows) * 264(columns) * 365(days) *
37(years)
> long.
>
> The product  of these dimensions is 1493869080 records.
>
> I'm using the following code to read these into R (windows 2.8.1 )
>
>  con <- file(file.choose(), open="rb")
>  Year66 <- readBin(con, integer, signed=TRUE, n = 40374840)
> close(con)
>
> length(Year66)
>
> returns 2046396
>
> I'm betting that I'm defining the "what" incorrectly, but after numerous
> attempts with different choices I'm wondering if readBin can handle
"short"
> values?
>
> Any help is greatly appreciated.
>
> Steve
>
>
> Steve Friedman Ph. D.
> Spatial Statistical Analyst
> Everglades and Dry Tortugas National Park
> 950 N Krome Ave (3rd Floor)
> Homestead, Florida 33034
>
> Steve_Friedman at nps.gov
> Office (305) 224 - 4282
> Fax     (305) 224 - 4147
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

Steve_Friedman at nps.gov
Office (305) 224 - 4282
Fax     (305) 224 - 4147




More information about the R-help mailing list