[R] SAS transport files and the foreign package
Tim Churches
tchur at optushome.com.au
Sun Jan 19 00:23:03 CET 2003
On Sat, 2003-01-18 at 07:45, Frank E Harrell Jr wrote:
> I had no idea how strange the XPORT format really is.
Like the fact that the IBM double precision representation used in XPORT
uses 7 bits for the exponent and 56 bits for the mantissa, whereas IEEE
format uses 11 bits for the exponent and 52 bits for the mantissa.
> Following Duncan Temple Lang's suggestion I am contacting one of our
> clients to see what they think about moving towards XML for this.
> My guess is that XML will take a while to be used routinely for
> this and that the sometimes huge datasets involved will cause XML
> files to be monstrous (compression will help but will tax memory
> usage of R at least temporarily during processing).
The nice things about the SAS XML engine are:
a) all the metadata associated with a dataset is included in the
generated XML file, including not just the names of the formats
for each variable (column), but the actual format value labels
themselves.
b) more than one dataset can be included in a single generated XML
export file
c) like the XPORT format, close to foolproof from the SAS user's
point of view, because the SAS XML engine does all the work.
The generated files are indeed huge (relative to the
amount of actual data they contain). For our purposes,
this is not likely to be a huge problem - we select
and/or summarise data in SAS, and then pass the subset or
summary set to R. At the moment, we are experimenting with
parsing the SAS XML files with Python and then passing the
data to R via RPy (the Python-to-R bridge) - mainly because
I am slightly more adept at writing Python than R. However, the
ability of R to read SAS XML files directly, and to set
up categorical SAS variables which have formats as factor
columns in R data.frames, would be fabulous.
Tim C
More information about the R-help
mailing list