[R] Fixed Width EBCDIC Files in R
John McKown
john.archie.mckown at gmail.com
Thu Feb 5 23:06:17 CET 2015
On Thu, Feb 5, 2015 at 2:08 PM, Brian Trautman <btrautman84 at gmail.com>
wrote:
> I'm trying to read some mainframe data encoded as EBCDIC into R, and am at
> a loss. I'd like to avoid using an external program to convert the files,
> since I'm operating in a corporate environment.
>
> You can find the example files at at the link below, with both ASCII and
> EBCDIC versions. Note that there are no linebreaks in the EBCDIC versions
> of the file -- instead, I'd be specifying the width of each line manually.
> R has the IBM500 encoding available in my environment, which should be the
> correct one for these files.
>
> However, when I run the following commands, R seems to fail entirely. It
> loads a single record with garbage characters, regardless of the encoding I
> specified.
>
>
> layout <- read.fwf("EBCDIC_LAYOUT", widths = c(80), fileEncoding='ibm500')
>
> data <- read.fwf("EBCDIC_ZIPCODE", widths = c(32), fileEncoding='ibm500')
>
>
> Where might I go from here?
>
> Related -- some of the files I expect to use will be fairly large (1 GB or
> so). Preferably, I'd like a solution that scales reasonably well. (I tried
> packages like LaF, but they don't have the option to select encoding.)
>
> Thank you very much!
>
>
> Example files --
> https://drive.google.com/open?id=0ByvX1v-WqaaASTdwV2ZYS0pBV00&authuser=0
>
>
I gave this a short try. What killed me (see below) is that your file
EBCDIC_ZIPCODE has embedded NULL characters, \0. My transcript:
> file<-file("EBCDIC_ZIPCODE",encoding="IBM500", raw=TRUE);
> data=read.fwf(file,widths=c(32));
Warning messages:
1: In readLines(file, n = thisblock) :
line 1 appears to contain an embedded nul
2: In readLines(file, n = thisblock) :
incomplete final line found on 'EBCDIC_ZIPCODE'
> View(data)
I don't know how to get past the embedded NULL. I'm a UNIX user, so my
thought (not applicable with your restriction of "pure R"), would be to use
"tr" to convert the \0 to spaces, then use the above.
--
He's about as useful as a wax frying pan.
10 to the 12th power microphones = 1 Megaphone
Maranatha! <><
John McKown
[[alternative HTML version deleted]]
More information about the R-help
mailing list