[R] SAS or R software

Frank E Harrell Jr f.harrell at vanderbilt.edu
Sat Dec 18 14:55:58 CET 2004

Marc Schwartz wrote:
> On Fri, 2004-12-17 at 17:11 -0500, Alexander C Cambon wrote: 
>>I apologize for adding this so late to the "SAS or R software "
>>This is a question, not a reply, but it seems to me to fit in well
>>the subject of this thread.
>>I would like to know anyone's experiences in the following two areas
>>below.  I should add I have no experience myself in these areas:
>>1) Migrating from SAS to R in the choice of statistical software used
>>for FDA  reporting.
> You will find that to be a non-issue from the FDA's perspective. This
> has been discussed here with some frequency.  If you search the archives
> you will find comments from Frank Harrell and others.
> The FDA does not and cannot endorse a particular software product. Nor
> does it validate any statistical software for a specific purpose. They
> do need to be able to reproduce the results, which means they need to
> know what software product was used, which version and on what platform,
> etc.
> The SAS XPORT Transport Format (which is openly defined and documented),
> has been used for the transfer of data sets and has been available in
> many statistical products.
> There have been a variety of activities (CDISC, HL-7, etc) regarding the
> electronic submission of data to the FDA. Some additional information is
> here:
> http://www.fda.gov/cder/regulatory/ersr/default.htm
> and here:
> http://www.cdisc.org/news/index.html
> Any other issues impacting the selection of a particular statistical
> application are more likely to be political within your working
> environment and FUD. 
> As you are likely aware, other statistically relevant issues are
> contained in various ICH guidance documents regarding GCP considerations
> and principles for clinical trials:
> http://www.ich.org/UrlGrpServer.jser?@_ID=475&@_TEMPLATE=272
> Keep in mind also that one big advantage R has (in my mind) is the use
> of Sweave for the reproducible generation of reports, which to an extent
> are self-documenting. 
>> (For example, was there more effort involved in areas of
>>documentation, revision tracking,  or validation of software codes?
> Since the FDA's role with computer software and validation has been
> raised before, the following documents cover many of these areas. The
> list is not meant to be exhaustive, but should give a flavor in this
> domain.
> There are specific guidance documents by the FDA pertaining to software
> that is contained in a medical device (ie. the firmware in a pacemaker
> or medical monitoring equipment) or is used to develop a medical device.
> The current guidance in this case is here:
> http://www.fda.gov/cdrh/comp/guidance/938.html
> Other guidance pertains to 21 CFR 11, which addresses data management
> systems used for clinical trials and covers issues such as electronic
> signatures, audit trails and the like. A guidance document for that is
> here:
> http://www.fda.gov/cder/guidance/5667fnl.htm
> Keep in mind, from a perspective standpoint, that even MS Excel and
> Access can be made to be 21 CFR 11 compliant and there are companies
> whose business is focused on just that task.
> There is also a general guidance document for computer systems used in
> clinical trials here:
> http://www.fda.gov/ora/compliance_ref/bimo/ffinalcct.htm
> Though it is to be superseded by a draft document here:
> http://www.fda.gov/cder/guidance/6032dft.htm 
>>2) Migrating from SAS to R in the choice of statistical software used
>>for NIH reporting  (or other US or non-US) government agencies) .
> Same here to my knowledge.
> As I was typing this, I see Frank just responded.
> I also just noted Doug's post, so perhaps some of the above information
> will be helpful in clarifying some of his questions as well.
> I believe that the above is factually correct, but if someone knows
> anything to not be so, please correct me.
> HTH,
> Marc Schwartz

In addition to the excellent points made by Marc, Doug, and Matt, I want 
to expand on the revision tracking point originally raised by Alexander. 
  We use CVS for all pharmaceutical industry work.  Besides allowing two 
statisticians working on each project to mirror each other's data and 
code (for backup when one is out and a pressing question is asked), the 
revision control and commented change tracking of CVS has proven to work 
incredibly well in this arena.

The one area where we use SAS for pharmaceutical industry work is 
running SAS PROC EXPORT to convert data to cvs format for importing with 
the Hmisc package's sasxport.get function (see 
We found that reading binary SAS transport format datasets in R or with 
Stat/Transfer was not reliable enough.  We have a freely available SAS 
macro that runs PROC EXPORT in a loop to get all datasets in a data 
library, with metadata.  That way any SAS exporting errors can be blamed 
on SAS.  Ironically there is a bug in PROC EXPORT.  When a character 
field has an unmatched quote in it, the CSV file can result in an odd 
number of quotes for the field.  sasxport.get checks the number of 
records imported against the number reported by PROC CONTENTS, so this 
problem is easily detected and corrected with emacs.

Note that with literally billions of dollars at their disposal, SAS 
didn't take the time to really write a procedure for PROC EXPORT.  Like 
the R sas.get function, it generates voluminous SAS DATA step code to do 
the work.

Regarding CDISC, the SAS transport format that is now accepted by FDA is 
deficient because there is no place for certain metadata (e.g., units of 
measurement, value labels are remote from the datasets, variable names 
are truncated to 8 characters).  The preferred format for CDISC will 
become XML.
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list