[R] Use of R in clinical trials

John Sorkin jsorkin at grecc.umaryland.edu
Fri Feb 19 19:13:41 CET 2010

A thoughtful, well reasoned discussion. I welcome this kind of analysis. (I was not criticizing Bert, I was using his post as an example of an unreasonable statement made to, but not by him, that can serve as an object lesson for all of us.)
I feel unhappy about posts which attack SAS, R or any other language just because the language is not R or SAS. Thoughtful comments like yours will get people to think about their choice of programming language. Many will chose R and that is good.
John Sorkin
JSorkin at grecc.umaryland.edu 
-----Original Message-----
From: Marc Schwartz <marc_schwartz at me.com>
To: John Sorkin <jsorkin at grecc.umaryland.edu>
To: Gunter Bert <gunter.berton at gene.com>
Cc: Dieter Menne <dieter.menne at menne-biomed.de>
Cc:  <r-help at r-project.org>

Sent: 2/19/2010 12:55:36 PM
Subject: Re: [R] Use of R in clinical trials

On Feb 19, 2010, at 6:56 AM, John Sorkin wrote:

> Bert,
> There is a lesson here. Just as intolerance of any statistical analysis program (or system) other than SAS should lead to our being drive crazy, so to should intolerance of
> any statistical analysis program (or system) other than R. 
> John  
>>>> Dieter Menne <dieter.menne at menne-biomed.de> 2/19/2010 3:46 AM >>>
> Bert,
> I like your comments. There is one issue, however, that drives me crazy
> whenever I meet a customer asking "You are not using SAS? Too bad, we need
> validated results."
> Bert Gunter wrote:
>> ...
>> Also to reiterate, it's not only
>> statistical/reporting functionality but even more the integration into the
>> existing clinical database systems that would have to be rewritten **and
>> validated**. 
> Implicitly: Even if you let your cat enter SAS code, the results are
> correct, because they SAS is validated.
> Dieter

If I may, let me offer some comments, which in part, are supportive of Bert's perspective.

First, the notion of validation that Bert raised should not be interpreted as indicating that SAS in a vacuum "out of the box" is validated, as if there was a parallel to a Good Housekeeping Seal of Approval for statistical software. There is no such thing for any software in this domain. 

Validation, in the context of regulated clinical trials (which we address in the R-FDA document available at http://www.r-project.org/doc/R-FDA.pdf) is defined by the FDA as: "Establishing documented evidence which provides a high degree of assurance that a specific process will consistently produce a product meeting its predetermined specifications and quality attributes." That is not something that can be provided by the vendor, it can only be done by the end user and their organization.

Now, that language is of course subject to interpretation, as FDA guidance is just that, "guidance". It is not prescriptive. One takes a risk mitigation based approach to implementing internal procedures and policies.

Internal validation is done via written Standard Operating Procedures (SOPs) that have been created, reviewed and approved by the end user's organization. The entire data path from the source data base to final report output and data sets must be tested to assure reliability and reproducibility. Thus, there is a significant amount of time and cost involved with this process and this is what Bert was referring to, which goes above and beyond the initial cost of the software and any annual licensing and support costs. It needs to be done irrespective of the software tool chain that one is using.

The scope (therefore the cost) of the validation testing will be heavily impacted upon by the nature of one's environment (eg. "big pharma" versus a "boutique drug house" versus a medical device company versus an independent contract research organization) and the level of risk mitigation (defined by lawyers) required by the organization.

These procedures and the associated documentation are also subject to on-site inspection by the FDA, which can shut you down if these are lacking.

It is not that one trusts SAS' output implicitly or by default. It is that one has documented through extensive testing that data manipulation and output is reliable, reproducible and importantly, that one has also documented known problems (bugs, incorrect results) and workarounds, if any. 

The same applies to R. Thus, while R may be "free" in all senses of that word, the actual monetary cost differential relative to software purchase and support is only one part of the equation and therefore the "value proposition" to the organization.

If one is to transition from SAS to R, then one's organization has to evaluate the total costs and risks associated with that transition. SOPs have to be written, reviewed and approved. Data management, analysis and reporting code has to be re-written, tested and validated. Interfaces to database servers have to be tested. Programmer's have to be re-trained in a new language and operating paradigm. Senior management has to be brought along to achieve a high level of comfort with anything new that may initially be seen as a risk factor in successfully doing business. A clear business case must be made to them that the advantages outweigh the potential risks.

Hence, there is a lot of resistance to the use of R in the clinical trial realm for these large companies because of those costs and timelines. Add to that the normal human behavioral factors of being resistant to change and the hurdle for R in these large corporate environments is not trivial.

To make the move from the pre-clinical drug discovery realm that Bert and others here work in using R, where some of these issues are not relevant, to the human clinical trial realm, we need to overcome that resistance. The organizations will need to also get to the point where the financial pressures are sufficient that paying millions of dollars per year in software licensing costs and the additional millions for the FTE's associated, become relevant from a bottom line perspective. When they get to that point and the value of R becomes clear to them, progress will be made. It will be incremental and evolutionary and will happen in some organizations earlier than others based upon their size, profile and operating environment. 

I might also point out that companies like SAS are sufficiently profitable, that in time, if pressure on their pricing is brought to bear, they will reduce their pricing to accommodate marketplace realities. They will reduce their margins rather than give up market share.

Part of the motivation to move to R may also be functional requirements that can only be satisfied with R as compared to other tools. That cannot be subjective "look and feel" characteristics, but more objective statistical methodological advantages.

To put some of the drug related costs in perspective, there was just a short letter to the editor in this week's issue of Applied Clinical Trials (an industry publication), which provides some insights into the challenges for drug development and approval, based upon industry outlook research done at Tuft's (http://csdd.tufts.edu/news/complete_story/pr_outlook_2010). 

The key figures from that study are that is takes over $1 billion U.S. and more than 7 years to take a drug from initial human trials to FDA approval. So based upon those figures alone, even if one spends $10 million per year on SAS licensing costs over the 7 years for a total of $70 million, that is only ~7% of the cost of bringing one big drug to market. Importantly, one has to recognize that the $10 million per year is in reality amortized over a much larger number of trials that are all running at the same time in various phases in the company's drug pipeline. Thus, the real annual software costs attributable to any single drug are far lower as a percentage of the total cost of bringing that drug to market, which further reduces the financial pressure on R&D costs associated with the software alone. 

Let's also not forget that one big blockbuster drug can bring in $1 billion in revenue per year post-approval so that has to be considered as well. The company will recoup 7 years of clinical trial associated costs for that one drug in 1 year. Those revenues will stay high for a number of years, at least until a generic version of the drug is available and potentially now with any changes in payments under any health care reform activities, at least here in the U.S., that may impact the revenue stream.

Just to pick one very large pharma company as an example (without naming it), 2009 revenues were in excess of $20 billion, with net income (profits) in excess of $4 billion. So even if you completely eliminated the $10 million for annual SAS licensing costs, you would have a marginal effect on that company's bottom line. There are bigger fish to fry.

The bigger cost savings being realized right now by large pharma, based upon the Tuft's report, is the more aggressive approach to terminating early phase trials when interim evidence becomes available suggesting the lack of viability of the drug. The use of adaptive trial designs are a key part of this change in process. The report shows a decline in recent years of the transition probability from Phase I to Phase II and from Phase II to Phase III. Curiously, the overall success rate of a drug from Phase I to FDA approval has stayed relatively stable at 16%, so there is more to be done here. By terminating unfavorable drug trials earlier, the opportunity to save those costs and re-direct them to more promising drugs is significant. Those costs far outweigh infrastructure costs such as software.

In either case, while being advocates of the use of R, we cannot be blind to the business realities in play in this particular environment. That being said, over the 8+ years that I have now been using R, the progress that has been made is nothing short of phenomenal. The growth of the community and the more recent publicity and recognition by SAS, SPSS and other vendors of R's influence are concrete signs of that progress. 

Importantly, we are seeing the increasing use of R within the FDA and other regulatory bodies, which only serves to further enhance R's position in this domain.

I have every confidence that this trend will continue for the foreseeable future. Progress specifically within industry in the human clinical trials arena will be slow and evolutionary as I have noted. It will likely take place incrementally in specific domains and for specific profiles of companies. As comfort with R appreciates, as more statisticians trained in R move from academia to industry and other factors become relevant which in turn give rise to opportunities for R, we will continue to see additional growth in this domain.


Marc Schwartz

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

More information about the R-help mailing list