[R] Re: Validation of R

Fri Apr 18 20:58:55 CEST 2003

Like the original poster, I'm in a corporation that interacts with the FDA
(submissions for product approval, and potential for auditing of QC
procedures).  I fully expect to be asked to validate R, in some sense,
within the year, maybe two.  I have two main comments.

First, I would be interested in participating in a small sub-project
interested in exploring this in very practical ways, such as
1.   Documenting resistance or regulatory needs R users are encountering in
this environment, offline from the r-help list,
2.   Sharing experiences (what works and what doesn't for assuaging
managers' fears), and
3.   If any further validation activities are deemed helpful (such as
additional test cases and describing what the test cases are intended to
test), making sure that these activities are fed back to the R project in a
way that others can leverage them in the future.

If you would also like to participate in this off-line discussion, I will
be happy to collect names and e-mails.  Or, if anyone has other ideas or
feels motivated to drive something, feel free to step forward.

Second, just minutes ago I raised this question with our software testor
over lunch.  She tests SAS code used to generate reports of clinical trial
results, and other software used to get clinical data into a database.  In
retrospect she is a biased sample (of size 1!) because the open-source
software model de-emphasizes the role (and value) of the professional
software testor; nonetheless I thought her comments offer a taste of the
opposition some may encounter.  I'll tell you what she said, and then I'll
offer my impressions; please don't argue with her points, because I already
did!

(A bit of background:  we have chosen not to validate SAS procedures, and
we say so in our test documentation.  In practice, I think our clinical
reporting rarely strays far from base SAS--99% of our reporting is just
manipulating and tabulating data--and that may be a reason for the
decision.)

In a nutshell, she thought SAS was more trustworthy than R (to the extent
that she thought we should test R's functions) based on two points:
1.   SAS has a team of professional software testors who spend their time
coming up with test cases that are as esoteric and odd as they can think of
(within the limits of their specifications).  She was not convinced that a
large community of users is sufficient to flush out obscure bugs.  In her
view (not surprisingly), software testors will look at software with a
unique eye.  (Which I think is true--but an army of users also does pretty
well.)
2.   SAS has a long history of quality, and their market niche requires
them to pay close attention to quality.  This distinguishes them from
Microsoft, which has little financial incentive to pay close attention to
quality, and does not have a history of quality despite a large group of
professional software testors.

She and I agreed that if one must know for certain that a particular
function works, one must test it or find documentation indicating precisely
how someone else tested it.  Fortunately R packages come with test cases,
but they're not usually test cases designed to check a large number of
possible failure mechanisms.

My take on this is as follows:
1.   There seem to be two varieties of validation involved here.  The first
provides clear assurance that a specific application does a specific thing.
This is what software validation should really be, and no software, not
even SAS, is above this.  Then there is "warm and fuzzy" validation that
offers limited assurance that the software is generally of good quality.
This is subjective, a matter of reputation, and there is no testing or
documentation that can definitively address this ill-defined criterion.  A
software package could be excellent, with only one bug, but if your
application hits that bug, you have a problem.
2.   I think this thread is mainly addressing the "warm and fuzzy"
validation model.  R is going to encounter skepticism among people who
haven't been exposed to it before, especially if they also have not been
exposed to other open-source software (OSS).  In my experience, people who
have not been involved in any software development expect corporate support
to lead to quality software ("they have resources!").  We all know this is
a fallacy, but you can't argue it away, you just have to demonstrate the
software.  When they become familiar with it, they'll stop asking for the
warm and fuzzy validation.

If my reading of the situation is correct, then the right response is to
dazzle.  The warm-and-fuzzy validation is really an opportunity for a
software demo.  Demonstrate the functions you're likely to use, especially
(following Dr. Harrell's advice) using simulation.  Then repeat the
simulation but with outliers added, and use robust methods.  Read in a CSV
file from a network drive, create some beautiful plots, save the data in
compressed format and document file size (also document the original CSV's
file size), read the data back into a concurrently-running R process and
show it's the same.  Install a particularly impressive and esoteric package
that's remotely related to your problem and document what it does.
Generate pseudorandom data using three different generators, from a given
seed, and then reproduce the data.  Calculate P(Z <= -20) for Z ~ N(0, 1),
then calculate P(Z > 20) using lower.tail = F.

You will provide only an iota of assurance that a particular future
application will work, but you will have removed all doubt that R is a
serious, rigorous, powerful package.  And that addresses the concerns that
may not be voiced, but are underlying.

-Jim Garrett
Becton Dickinson Diagnostic Systems

**********************************************************************
This message is intended only for the designated recipient(s).  ... {{dropped}}