[Rd] Reading a repeated fixed format

Douglas Bates dmbates at gmail.com
Wed Aug 10 22:41:39 CEST 2005


The Harwell-Boeing format for exchanging matrices is one of those
lovely legacy formats that is based on fixed-format  Fortran
specifications and 80 character records.  (Those of you who don't know
why they would be 80 characters instead of, say, 60 or 100 can ask one
of us old-timers some day and we'll tell you long, boring stories
about working with punched cards.)

Reading this format would take about 10 lines of R code if it were not
for the fact that it allows things like 40 two-digit integers to be
written as one 80 character record with no separators.  This actually
made sense to some people once upon a time.

I could use read.fwf or, better, use some of the code in the read.fwf
function to extract the strings that should have been separated and
convert them to numeric values but I have been trying to think if
there is a more clever way of doing this.  I know the number of
records and the number of elements to read and, if it would help, I
can assemble the records into one long text string.

Can anyone think of a vectorized way to extract successive substrings
of length k or, perhaps, a way to use regular expressions to insert a
blank after every k characters?



More information about the R-devel mailing list