[R] Fixed Width EBCDIC Files in R
Brian Trautman
btrautman84 at gmail.com
Thu Feb 5 21:08:51 CET 2015
I'm trying to read some mainframe data encoded as EBCDIC into R, and am at
a loss. I'd like to avoid using an external program to convert the files,
since I'm operating in a corporate environment.
You can find the example files at at the link below, with both ASCII and
EBCDIC versions. Note that there are no linebreaks in the EBCDIC versions
of the file -- instead, I'd be specifying the width of each line manually.
R has the IBM500 encoding available in my environment, which should be the
correct one for these files.
However, when I run the following commands, R seems to fail entirely. It
loads a single record with garbage characters, regardless of the encoding I
specified.
layout <- read.fwf("EBCDIC_LAYOUT", widths = c(80), fileEncoding='ibm500')
data <- read.fwf("EBCDIC_ZIPCODE", widths = c(32), fileEncoding='ibm500')
Where might I go from here?
Related -- some of the files I expect to use will be fairly large (1 GB or
so). Preferably, I'd like a solution that scales reasonably well. (I tried
packages like LaF, but they don't have the option to select encoding.)
Thank you very much!
Example files --
https://drive.google.com/open?id=0ByvX1v-WqaaASTdwV2ZYS0pBV00&authuser=0
[[alternative HTML version deleted]]
More information about the R-help
mailing list