[R] SAS or R software
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Sat Dec 18 14:55:58 CET 2004
Marc Schwartz wrote:
> On Fri, 2004-12-17 at 17:11 -0500, Alexander C Cambon wrote:
>>I apologize for adding this so late to the "SAS or R software "
>>This is a question, not a reply, but it seems to me to fit in well
>>the subject of this thread.
>>I would like to know anyone's experiences in the following two areas
>>below. I should add I have no experience myself in these areas:
>>1) Migrating from SAS to R in the choice of statistical software used
>>for FDA reporting.
> You will find that to be a non-issue from the FDA's perspective. This
> has been discussed here with some frequency. If you search the archives
> you will find comments from Frank Harrell and others.
> The FDA does not and cannot endorse a particular software product. Nor
> does it validate any statistical software for a specific purpose. They
> do need to be able to reproduce the results, which means they need to
> know what software product was used, which version and on what platform,
> The SAS XPORT Transport Format (which is openly defined and documented),
> has been used for the transfer of data sets and has been available in
> many statistical products.
> There have been a variety of activities (CDISC, HL-7, etc) regarding the
> electronic submission of data to the FDA. Some additional information is
> and here:
> Any other issues impacting the selection of a particular statistical
> application are more likely to be political within your working
> environment and FUD.
> As you are likely aware, other statistically relevant issues are
> contained in various ICH guidance documents regarding GCP considerations
> and principles for clinical trials:
> Keep in mind also that one big advantage R has (in my mind) is the use
> of Sweave for the reproducible generation of reports, which to an extent
> are self-documenting.
>> (For example, was there more effort involved in areas of
>>documentation, revision tracking, or validation of software codes?
> Since the FDA's role with computer software and validation has been
> raised before, the following documents cover many of these areas. The
> list is not meant to be exhaustive, but should give a flavor in this
> There are specific guidance documents by the FDA pertaining to software
> that is contained in a medical device (ie. the firmware in a pacemaker
> or medical monitoring equipment) or is used to develop a medical device.
> The current guidance in this case is here:
> Other guidance pertains to 21 CFR 11, which addresses data management
> systems used for clinical trials and covers issues such as electronic
> signatures, audit trails and the like. A guidance document for that is
> Keep in mind, from a perspective standpoint, that even MS Excel and
> Access can be made to be 21 CFR 11 compliant and there are companies
> whose business is focused on just that task.
> There is also a general guidance document for computer systems used in
> clinical trials here:
> Though it is to be superseded by a draft document here:
>>2) Migrating from SAS to R in the choice of statistical software used
>>for NIH reporting (or other US or non-US) government agencies) .
> Same here to my knowledge.
> As I was typing this, I see Frank just responded.
> I also just noted Doug's post, so perhaps some of the above information
> will be helpful in clarifying some of his questions as well.
> I believe that the above is factually correct, but if someone knows
> anything to not be so, please correct me.
> Marc Schwartz
In addition to the excellent points made by Marc, Doug, and Matt, I want
to expand on the revision tracking point originally raised by Alexander.
We use CVS for all pharmaceutical industry work. Besides allowing two
statisticians working on each project to mirror each other's data and
code (for backup when one is out and a pressing question is asked), the
revision control and commented change tracking of CVS has proven to work
incredibly well in this arena.
The one area where we use SAS for pharmaceutical industry work is
running SAS PROC EXPORT to convert data to cvs format for importing with
the Hmisc package's sasxport.get function (see
We found that reading binary SAS transport format datasets in R or with
Stat/Transfer was not reliable enough. We have a freely available SAS
macro that runs PROC EXPORT in a loop to get all datasets in a data
library, with metadata. That way any SAS exporting errors can be blamed
on SAS. Ironically there is a bug in PROC EXPORT. When a character
field has an unmatched quote in it, the CSV file can result in an odd
number of quotes for the field. sasxport.get checks the number of
records imported against the number reported by PROC CONTENTS, so this
problem is easily detected and corrected with emacs.
Note that with literally billions of dollars at their disposal, SAS
didn't take the time to really write a procedure for PROC EXPORT. Like
the R sas.get function, it generates voluminous SAS DATA step code to do
Regarding CDISC, the SAS transport format that is now accepted by FDA is
deficient because there is no place for certain metadata (e.g., units of
measurement, value labels are remote from the datasets, variable names
are truncated to 8 characters). The preferred format for CDISC will
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help