[R] Fixed Width EBCDIC Files in R

John McKown john.archie.mckown at gmail.com
Thu Feb 5 23:06:17 CET 2015


On Thu, Feb 5, 2015 at 2:08 PM, Brian Trautman <btrautman84 at gmail.com>
wrote:

> I'm trying to read some mainframe data encoded as EBCDIC into R, and am at
> a loss. I'd like to avoid using an external program to convert the files,
> since I'm operating in a corporate environment.
>
> You can find the example files at at the link below, with both ASCII and
> EBCDIC versions. Note that there are no linebreaks in the EBCDIC versions
> of the file -- instead, I'd be specifying the width of each line manually.
> R has the IBM500 encoding available in my environment, which should be the
> correct one for these files.
>
> However, when I run the following commands, R seems to fail entirely.  It
> loads a single record with garbage characters, regardless of the encoding I
> specified.
>
>
> layout <- read.fwf("EBCDIC_LAYOUT", widths = c(80), fileEncoding='ibm500')
>
> data   <- read.fwf("EBCDIC_ZIPCODE", widths = c(32), fileEncoding='ibm500')
>
>
> Where might I go from here?
>
> Related -- some of the files I expect to use will be fairly large (1 GB or
> so). Preferably, I'd like a solution that scales reasonably well. (I tried
> packages like LaF, but they don't have the option to select encoding.)
>
> Thank you very much!
>
>
> Example files --
> https://drive.google.com/open?id=0ByvX1v-WqaaASTdwV2ZYS0pBV00&authuser=0
>
>
​
I gave this a short try. What killed me (see below) is that your file
EBCDIC_ZIPCODE has embedded NULL characters, \0. My transcript:

> file<-file("EBCDIC_ZIPCODE",encoding="IBM500", raw=TRUE);
> data=read.fwf(file,widths=c(32));
Warning messages:
1: In readLines(file, n = thisblock) :
  line 1 appears to contain an embedded nul
2: In readLines(file, n = thisblock) :
  incomplete final line found on 'EBCDIC_ZIPCODE'
> View(data)

I don't know how to get past the embedded NULL. I'm a UNIX user, so my
thought (not applicable with your restriction of "pure R"), would be to use
"tr" to convert the \0 to spaces, then use the above.​


-- 
He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown

	[[alternative HTML version deleted]]



More information about the R-help mailing list