[Rd] Problem with read.xport() from foreigh package (PR#7389)
Werner Engl
englw at gmx.net
Thu Dec 9 14:33:07 CET 2004
Dear R-devel list,
This is to confirm Prof. Ripley's analysis of the
read.xport issue.
The section on missing data in TS140 is pertinent
to numeric variables only. In SAS, character
variables are of fixed length (between 1 and 200
for the xport format). Shorter strings are padded
with trailing blanks when assigned to a variable.
An uninitialized character variable is stored as
all blanks in the xport format file. This is the
only representation of 'missing' data for SAS
character variables. 'Special missing' codes
(.A to .Z and ._) are available for numeric
variables only.
Please find enclosed a patch to the
R-2.0.1/src/library/Recommended/foreign/SASxport.c
file and a xport file that I used for testing. The
xport file was created by SAS V8.2 on Linux, but
should be plattform and version independent (except
for the header information). I have simply commented
out the code lines that try to detect missing character
values.
The code in SASxport.c already does a good job in
removing trailing blanks from character values.
For missing character data (all blanks) the result
is the empty string (""), which is fine for me.
There is no equivalent to the R missing character
representation in SAS (as far as I know).
The enclosed gzipped tar file contains:
diff_SASxport_c.txt diff for SASxport.c
xptchar1.xpt test file in xport format
xptchar.sas trivial SAS program used to
generate xptchar1.xpt
xptchar_SAS_System_Viewer9_1.csv xptchar1.xpt
converted to comma separated file using SAS
System Viewer 9.1 (on Win XP)
With the patch applied, read.xport produces the same
data frame from xptchar1.xpt as read.csv does from
xptchar_SAS_System_Viewer9_1.csv (tested on i386 Linux
with R Version 2.0.1) except that read.csv converts empty
strings to NAs. As explained above, the empty string is
closer to the meaning of an all-blanks value in SAS.
There is renewed interest in this old data format in
the pharmaceutical industry, because the US Food and
Drug Administration requests clinical and
pre-clinical data to be submitted in this format. I
spent some time analyzing the xport file format to
be sure of what is actually submitted to FDA with
these files.
Thank you for considering this patch (and for the
great R system, of course)!
Best regards,
Werner Engl
_____________________________________
Werner Engl, PhD, CStat
Senior Manager, Biostatistics
Baxter AG, Vienna, Austria
e-mail: werner_engl at baxter.com
--- Please disregard any text below this line ---
--
GMX DSL-Netzanschluss + Tarif zum supergünstigen Komplett-Preis!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PR7389_we20041209.tar.gz
Type: application/x-gzip
Size: 1727 bytes
Desc: not available
Url : https://stat.ethz.ch/pipermail/r-devel/attachments/20041209/0f297125/PR7389_we20041209.tar.gz
More information about the R-devel
mailing list