[R] R and clinical studies
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Tue Mar 20 13:38:11 CET 2007
Cody_Hamilton at Edwards.com wrote:
> Thank you to all those that responded to Delphine's original post on R and
> clinical studies. They have provided much food for thought.
> I had a couple of follow up questions/comments. Andrew is very correct in
> pointing out that there are classes and workshops available for R. It's my
> understanding that there are even commercial versions of R that now provide
> formal commercial-style courses. And at any rate, the money saved by
> potentially avoiding pricey software could certainly justify any training
> expense in time or money - this assumes of course that the pricey software
> could be dispensed with (I suspect that would take considerable time at my
> current company as so many legacy projects have been done in proprietary
> software). I still think that R provides less 'hand-holding' and requires
> more initiative (which may be more or less present on a per
> programmer/statistician basis).
> I guess one could always integrate R/Splus in with SAS, as Terry's group
> has done at Mayo - I will probably do this at least as a start. I have a
> few concerns with regards to this approach (these may be needless concerns,
> but I will venture expressing them anyway). First, I'm worried about the
> possibility of compatability concerns (will anyone be worried about a SAS
> dataset read into R or vice-versa?). Second, I would prefer focusing all
> my learning on one package if possible. I actually have more experience
> with SAS (as do others in my group), and if the switch to R is to be made I
> would like to make that switch as complete as possible. This would also
> avoid requiring new hires to know both languages. Third, if SAS is to be
> kept around, it defeats one of the main advantages of having open source
> code in the first place (R is wonderfully free!). Like Mayo, Baylor Health
> (my previous employer) used both Splus and SAS. I was warned that data
> manipulation would be much more difficult in R/Splus than it was in SAS.
> To be honest, and I say this humbly realizing that most posters to this
> list have much more experience than I, I haven't found data manipulation to
> be that much more difficult in R/Splus (at least as I have gained
> experience in R/Splus). I can think of two exceptions (1) large datasets
> and (2) SAS seems to play nicer with MS products (e.g. PROC IMPORT seemed
> to read in messy Excel spreadsheets better than importData in Splus). Is
> it possible (and I again say this with MUCH humility) that the perceived
> advantages of SAS with regards to data manipulation may be due in part to
> some users only using R/Splus for stat modeling and graphics (thus never
> becoming familiar with the data manipulation capabilities of R/Splus) or to
> the reluctance of SAS-trained individuals and companies to make the
> complete switch?
You are exactly correct on this point. Some graduate programs only
teach students how to use R/S-Plus for modeling and graphics. R/S-Plus
are wonderful for data manipulation - more powerful than SAS but not
easy to learn (plus in R there are sometimes too many ways to do
something; new users get lost - e.g. the reshape and reShape functions
and the reshape package).
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RS/sintro.pdf has many
examples of complex data manipulation as do some web sites. We do
analysis for pharmaceutical companies with 100% of the data manipulation
done in R after importing say 50 SAS datasets into R. Doing tasks such
as finding a lab value measured the closest in time to some event is
much more elegant in R/S-Plus than in SAS.
> Tony, the story about the "famous software" and the "certain operating
> system" at the "large company" was priceless.
> In closing, I should mention that in all posts I am speaking for myself and
> not for Edwards LifeSciences.
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help