[R] Converting SAS Data code to R.
David Winsemius
dwinsemius at comcast.net
Sun Sep 27 06:33:53 CEST 2009
I am contemplating bringing in and merging three NHANES-III datasets
from the National Center for Health Statistics that are fixed format
with record length=3348, line counts around 20,000 and described by
SAS DATA steps. I have downloaded and linked similar datasets from the
Continuous NHANES public data releases, but never ones with this many
variables at once. In the prior effort I managed the task by some cut-
paste-editing from the SAS code file into a corresponding read.fwf R
call, but the earlier NHANES-III data is far more voluminous than the
more recent "Continuous" version. I am wondering if anyone has
experience with such a process and would be willing to share some
advice? The SAS code can be seen here:
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHANES/NHANESIII/1A/adult.sas
The main code file Data step starts out...
FILENAME ADULT "D:\Questionnaire\DAT\ADULT.DAT" LRECL=3348;
*** LRECL includes 2 positions for CRLF, assuming use of PC SAS;
DATA WORK;
INFILE ADULT MISSOVER;
LENGTH
SEQN 7
DMPFSEQ 5
DMPSTAT 3
DMARETHN 3
DMARACER 3
DMAETHNR 3
HSSEX 3
The corresponding positions in the INPUT section are
INPUT
SEQN 1-5
DMPFSEQ 6-10
DMPSTAT 11
DMARETHN 12
DMARACER 13
DMAETHNR 14
HSSEX 15
The note about CRLF appears to be implying that those characters are
being counted as part of the length of the first variable, SEQN, but
that there are only 5 meaningful positions. I suppose I can find out
by trial and error how to read such files, but it would save me some
time if anyone in the audience has worked through this on this data
before.
One thought would be to import the data with the SAS work-alike
program, WKS, (which I have not used before) and then to read in with
read.xport from the foreign library. That would obviate the need to
understand the character position issue, but probably has a time
commitment to get it up and running and learn how to use it.
Another thought would be to parse the fixed width SAS Data step code
into pieces and build a data.frame from which I then extract the
row.names, col.names, and colClasses from that centralized structure.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list