[R] R and clinical studies

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Mar 20 13:38:11 CET 2007

Cody_Hamilton at Edwards.com wrote:
> Thank you to all those that responded to Delphine's original post on R and
> clinical studies.  They have provided much food for thought.
> I had a couple of follow up questions/comments.  Andrew is very correct in
> pointing out that there are classes and workshops available for R.  It's my
> understanding that there are even commercial versions of R that now provide
> formal commercial-style courses.  And at any rate, the money saved by
> potentially avoiding pricey software could certainly justify any training
> expense in time or money  - this assumes of course that the pricey software
> could be dispensed with (I suspect that would take considerable time at my
> current company as so many legacy projects have been done in proprietary
> software).  I still think that R provides less 'hand-holding' and requires
> more initiative (which may be more or less present on a per
> programmer/statistician basis).
> I guess one could always integrate R/Splus in with SAS, as Terry's group
> has done at Mayo - I will probably do this at least as a start.  I have a
> few concerns with regards to this approach (these may be needless concerns,
> but I will venture expressing them anyway).  First, I'm worried about the
> possibility of compatability concerns (will anyone be worried about a SAS
> dataset read into R or vice-versa?).  Second, I would prefer focusing all
> my learning on one package if possible.  I actually have more experience
> with SAS (as do others in my group), and if the switch to R is to be made I
> would like to make that switch as complete as possible.   This would also
> avoid requiring new hires to know both languages.  Third, if SAS is to be
> kept around, it defeats one of the main advantages of having open source
> code in the first place (R is wonderfully free!).  Like Mayo, Baylor Health
> (my previous employer) used both Splus and SAS.  I was warned that data
> manipulation would be much more difficult in R/Splus than it was in SAS.
> To be honest, and I say this humbly realizing that most posters to this
> list have much more experience than I, I haven't found data manipulation to
> be that much more difficult in R/Splus (at least as I have gained
> experience in R/Splus).   I can think of two exceptions (1) large datasets
> and (2) SAS seems to play nicer with MS products (e.g. PROC IMPORT seemed
> to read in messy Excel spreadsheets better than importData in Splus).  Is
> it possible (and I again say this with MUCH humility) that the perceived
> advantages of SAS with regards to data manipulation may be due in part to
> some users only using R/Splus for stat modeling and graphics (thus never
> becoming familiar with the data manipulation capabilities of R/Splus) or to
> the reluctance of SAS-trained individuals and companies to make the
> complete switch?

You are exactly correct on this point.  Some graduate programs only 
teach students how to use R/S-Plus for modeling and graphics.  R/S-Plus 
are wonderful for data manipulation - more powerful than SAS but not 
easy to learn (plus in R there are sometimes too many ways to do 
something; new users get lost - e.g. the reshape and reShape functions 
and the reshape package). 
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RS/sintro.pdf has many 
examples of complex data manipulation as do some web sites.  We do 
analysis for pharmaceutical companies with 100% of the data manipulation 
done in R after importing say 50 SAS datasets into R.  Doing tasks such 
as finding a lab value measured the closest in time to some event is 
much more elegant in R/S-Plus than in SAS.


> Tony, the story about the "famous software" and the "certain operating
> system" at the "large company" was priceless.
> In closing, I should mention that in all posts I am speaking for myself and
> not for Edwards LifeSciences.
> Regards,
>     -Cody
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list