[Rd] read.xport and lookup.xport in foreign (PR#2385)
fharrell@virginia.edu
fharrell@virginia.edu
Fri Dec 20 18:48:02 2002
Under
platform i686-pc-linux-gnu
arch i686
os linux-gnu
system i686, linux-gnu
status
major 1
minor 6.1
year 2002
month 11
day 01
language R
and using foreign 0.5-8 I am encountering errors when using read.xport. Here's code for producing SAS transport files for testing:
libname x SASV5XPT "test.xpt";
libname y SASV5XPT "test2.xpt";
PROC FORMAT; VALUE race 1=green 2=blue 3=purple; RUN;
PROC FORMAT CNTLOUT=format;RUN;
data test;
LENGTH race 3 age 4;
age=30; label age="Age at Beginning of Study";
race=2;
d1='3mar2002'd ;
dt1='3mar2002 9:31:02'dt;
t1='11:13:45't;
output;
age=31;
race=4;
d1='3jun2002'd ;
dt1='3jun2002 9:42:07'dt;
t1='11:14:13't;
output;
format d1 mmddyy10. dt1 datetime. t1 time. race race.;
run;
PROC COPY IN=work OUT=x;SELECT test;RUN;
PROC COPY IN=work OUT=y;SELECT test format;RUN;
SAS output:
NOTE: Copying WORK.TEST to X.TEST (memtype=DATA).
NOTE: There were 2 observations read from the data set WORK.TEST.
NOTE: The data set X.TEST has 2 observations and 5 variables.
NOTE: PROCEDURE COPY used:
real time 1.52 seconds
cpu time 0.04 seconds
NOTE: Copying WORK.TEST to Y.TEST (memtype=DATA).
NOTE: There were 2 observations read from the data set WORK.TEST.
NOTE: The data set Y.TEST has 2 observations and 5 variables.
NOTE: Copying WORK.FORMAT to Y.FORMAT (memtype=DATA).
NOTE: There were 3 observations read from the data set WORK.FORMAT.
NOTE: The data set Y.FORMAT has 3 observations and 21 variables.
NOTE: PROCEDURE COPY used:
R results:
> library(foreign)
> read.xport('test.xpt')
RACE AGE D1 DT1 T1
1 2.000063 30.00000 15402 1330767062 40425
2 4.000063 31.00000 15494 1338716527 40453
Note the corruption of RACE (a variable having a SAS length of 3 bytes).
> read.xport('test2.xpt')
RACE AGE D1 DT1 T1
1 2.000063e+00 3.000000e+01 1.540200e+04 1.330767e+09 4.042500e+04
2 4.000063e+00 3.100000e+01 1.549400e+04 1.338717e+09 4.045300e+04
3 3.687825e-40 3.687825e-40 3.687825e-40 3.687896e-40 5.962240e+20
...
124 3.835229e-93 6.434447e-86 NA 3.687825e-40 3.687825e-40
Note corrupted data when trying to read a SAS transport file containing more than one SAS dataset. According to the documentation, read.xport is supposed to work in this case and is supposed to return a list of data frames.
> names(lookup.xport('test2.xpt'))
[1] "TEST"
Note the inclusion of only one of the 2 datasets.
Also I would greatly benefit from having lookup.xport return all of the SAS variable attributes, especially variable label and format name. I could then write a little function for the community that makes read.xport as comprehensive as read.spss in terms of creating factor variables and variable labels, if the user exports the PROC CONTENTS CNTLOUT= dataset.
Thanks.
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat