[R] FDA and ICH Compliance of R

Frank E Harrell Jr feh3k at spamcop.net
Fri Nov 28 01:11:06 CET 2003

On Thu, 27 Nov 2003 09:59:57 -0500 (EST)
"Gabor Grothendieck" <ggrothendieck at myway.com> wrote:

> From: Frank E Harrell Jr <feh3k at spamcop.net>
> > per year in SAS licenses and have to hire armies of non-intellectually
> > challenged SAS programmers to do the work of significantly fewer
> > programmers that use modern statistical computing tools like R and
> > S-Plus, it is surprising that SAS is still the most commonly used tool
> > in the clinical side of drug development. I quit using SAS in 1991
> > because my productivity jumped at least 20% within one month of using
> > S-Plus.
> I have not used SAS for even longer than you but to
> give SAS its due:
> - its pretty easy to produce all the info you need for a
>   complete analysis with a few SAS commands.  It would be
>   possible to create analogous R commands but as it stands
>   you have to keep going back and forth with R rather than
>   just get it all out at once like you can with SAS.

Thanks for your note Gabor.  It depends on what you mean by "complete
analysis".  SAS often would give me things I didn't need but was and is
short on modern methods.  But to address the needs I think you are getting
at, this is the reason I developed the Hmisc package (especially

> - SAS has more functionality in missing values.  You
>   can have different types of SAS missing values but in R you
>   can have only one type of missing value.

Several points here.  First, I always liked the 27 levels of missing that
SAS supported, but I've never seen a pharmaceutical company actually use
more than the standard missing (.).  Second, you can easily implement them
in R and S-Plus anyway; the sas.get function in Hmisc imports all SAS
special missing values and lets you work with them (e.g.,
is.special.miss(x, 'B')) while treating all of them as NA in standard
calculations.  In S you can add your own attributes on the fly (as long as
you don't use the new class mechanism) so you can do things much more
generally than with SAS.  For example, I can add 'comment' attributes and
attributes documenting file names containing the image of the case report
form page containing the variable, etc.  When you get to missing value
imputation, S has more methods available than SAS.

> - the BY phrase in SAS is incredibly powerful and handy.  You
>   can get the same effect in R but I think that specific
>   functionality is easier with SAS.

Again I'll have to respectfully disagree.  BY in SAS is very good for
within-procedure repetition of analyses, but not between procedure.  And
if you need any SAS PROC IML code to do customized matrix programming, you
lose the ability to do by-processing.  In S you can put any number of
things within a loop, with an easier-to-use mechanism for collecting the

> Obviously R is incredibly powerful and functional and I really
> am out of touch with the SAS world but I thought I would make
> whatever case I could.  I am willing to be corrected by those 
> more in the know with SAS if this wrong.

My view is that SAS is best at handling massive databases when you need
standard (i.e., older) methods run, and SAS is very good at getting
P-values in mixed effects models.  Other than that, S is better in almost
every way.  Over the years I've developed documents demonstrating how to
do data manipulation in S (yes, S is superior to SAS for this task) and
how to make semi-advanced statistical reports and fairly complex tables. 
Granted, the learning curve for S is not shallow but the payoff is great
in terms of productivity, beauty of output (when coupling S with LaTeX)
and availability of modern applied statistical methods.

Frank E Harrell Jr    Professor and Chair            School of Medicine
                      Department of Biostatistics    Vanderbilt University

More information about the R-help mailing list