[R] Do you use R for data manipulation?

Frank E Harrell Jr f.harrell at vanderbilt.edu
Wed May 6 14:25:39 CEST 2009


I second what Zeljko wrote.  In addition, see the data manipulation 
section in Chapter 4 of 
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RS/sintro.pdf

Frank

Zeljko Vrba wrote:
> Sorry for reply to the wrong person, I lost the original email.
> 
>> Farrel Buchinsky wrote:
>>> Is R an appropriate tool for data manipulation and data reshaping and data
>>> organizing? I think so but someone who recently joined our group thinks 
>>> not.
>>> The new recruit believes that python or another language is a far better
>>> tool for developing data manipulation scripts that can be then used by
>>> several members of our research group. Her assessment is that R is useful
>>> only when it comes to data analysis and working with statistical models.
> 
> I personally started to use R because I got tired of manually writing scripts
> for data manipulation and processing.  The argument of your new recruit smells
> of ignorance and resistance to learning something new.  Ask her _how_ did she
> assess R, how much time she spent on her assessment and whether did she
> actually try to run it and perform some concrete simple tasks.
> 
> (Yes, R is somewhat "different", it has a steep learning curve, but the effort
> of learning it is worth it.  And yes, R can be used in the same way as any
> other scripting language, i.e., it is not restricted to interactive work.)
> 
> Take a look at plyr and reshape packages (http://had.co.nz/), I have a hunch
> that they would have saved me a lot of headache had I found out about them
> earlier :)
> 
> I would also recommend investing in Phil Spector's book "Data manipulation with
> R", it will get you started much faster.
> 
> I also find R's image files very convenient for sharing data (and code!) in a
> very compact format (single file, portable across architectures).  When you
> quit your R session, all the variables and functions get saved in the image
> file, which you can take with you (or send to somebody else), start R again,
> load the image into a new session and continue from where you left.  You won't
> get this kind of automatic persistence in any scripting language out of the
> box.
> 
>>> So what do you think:
>>> 1)R is a phenomenally powerful and flexible tool and since you are going to
>>> do analyses in R you might as well use it to read data in and merge it and
>>> reshape it to whatever you need.
>>> OR
>>> 2) Are you crazy? Nobody in their right mind uses R to pipe the data around
>>> their lab and assemble it for analysis.
> 
> I'd go with 1).  R has also interfaces towards databases through RODBC, so you
> do not have to go through several conversions when you're about to process or
> plot data in R.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list