# [R] More help with Binary Files

Steve_Friedman at nps.gov Steve_Friedman at nps.gov
Wed Feb 11 21:25:52 CET 2009

```
Does anyone else have any insights to this issue:

Henrick, thank you for your very quick response.  I've examined the readBin
help file with respect to endian and I'm still not sure I'm getting this
correct.

Here is what I'm coding:

con <- file(file.choose(), open="rb")
Year66 <- readBin(con, what=integer(), signed = TRUE, size = 2,
endian="little",  n = 40374840)    # define endian= "little"
length(Year66)
close(con)

# convert millimeters to inches
Year66.in <- Year66 * 0.039370

describe(Year66.in)
Year66.in
n missing  unique    Mean     .05     .10     .25     .50     .75
.90     .95
8185584       0   65511  -21.56  -650.1  -650.1  -162.2     0.0     0.0
636.5   639.1

lowest : -1290 -1290 -1290 -1290 -1290, highest:  1290  1290  1290  1290
1290

# establish cut points using inches
bins <- cut(Year66.in, breaks=30)
barplot(table(bins))

length(Year66.in)  # this returns a value representing the number of
records read as 8185584 or 20.2% (see next line)  of the records that I'm
expecting.
length(Year66.in) / (419*264*365)  # returns proportion of records expected
in one year

####  here I will introduce code to classify the summary statistics using
both a clustering and a non-metric scaling function.  These procedures will
hopefully enable differentiation of ####  cluster-groupings, associating
the initial input annual year values with a separate (not-shown) calculated
index.

What I eventually want to accomplish is a statistical summary for each of
the 37 years in the binary file.  Reading in the file on a year to year
basis (n=40374840) should give me the all of the records for just the first
year, not all of the records in the binary file.  I also therefore need to
better understand how to read a set of records for year 2, 3, 4, ... 37.

Any ideas ?

Steve

Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)

Steve_Friedman at nps.gov
Office (305) 224 - 4282
Fax     (305) 224 - 4147

Henrik Bengtsson
<hb at stat.berkeley
.edu>                                                      To
Sent by:                  Steve_Friedman at nps.gov
henrik.bengtsson@                                          cc
gmail.com                 r-help at r-project.org
Subject
02/11/2009 09:20
AM PST

Argument 'size' is what you are looking for, cf. help(readBin).
Whenever reading binary files this way, I strongly recommend that you

readBin(con, what=integer(), size=2, signed=TRUE, endian="little", n=n);

For instance, you probably do not want 'endian' to be dependent on the
platform (see help) you run on, but instead be specific to the file

/Henrik

On Wed, Feb 11, 2009 at 8:04 AM,  <Steve_Friedman at nps.gov> wrote:
>
> Hello
>
> I'm encountering some difficulty correctly reading binary files. The
binary
> files store data as "short"  rather than "double" , "int", or any of the
> other  modes of the vector being read.
>
> The data represents a regular grid of size 419 rows by 264 columns, to
make
> it more interesting, the data are daily records, for a total of 37 years.
> The file size is therefore 419(rows) * 264(columns) * 365(days) *
37(years)
> long.
>
> The product  of these dimensions is 1493869080 records.
>
> I'm using the following code to read these into R (windows 2.8.1 )
>
>  con <- file(file.choose(), open="rb")
>  Year66 <- readBin(con, integer, signed=TRUE, n = 40374840)
> close(con)
>
> length(Year66)
>
> returns 2046396
>
> I'm betting that I'm defining the "what" incorrectly, but after numerous
> attempts with different choices I'm wondering if readBin can handle
"short"
> values?
>
> Any help is greatly appreciated.
>
> Steve
>
>
> Steve Friedman Ph. D.
> Spatial Statistical Analyst
> Everglades and Dry Tortugas National Park
> 950 N Krome Ave (3rd Floor)
>
> Steve_Friedman at nps.gov
> Office (305) 224 - 4282
> Fax     (305) 224 - 4147
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)