[R] Do you use R for data manipulation?

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Wed May 13 10:46:20 CEST 2009


Warren Young wrote:
> Farrel Buchinsky wrote:
>> Is R an appropriate tool for data manipulation and data reshaping and
>> data
>> organizing? I think so but someone who recently joined our group
>> thinks not.
>> The new recruit believes that python or another language is a far better
>> tool for developing data manipulation scripts that can be then used by
>> several members of our research group. Her assessment is that R is
>> useful
>> only when it comes to data analysis and working with statistical models.
>
> It's hard to shift people's individual preferences, but impressive
> objective comparisons are easy to come by.  Ask her how many lines it
> would take to do this trivial R task in Python:
>
>     data <- read.csv('original-data.csv')
>     write.csv('scaled-data.csv', data * 10)


you might want to learn that this is a question of appropriate
libraries.  in r, read.csv and write.csv reside in the package utils. 
in python, you'd use numpy:

    from numpy import loadtxt, savetxt
    savetxt('scaled.csv', loadtxt('original.csv', delimiter=',')*10,
delimiter=',')

this makes 2 lines, together with importing the library.


>
> R's ability to do something to an entire data structure -- or a slice
> of it, or some other subset -- in a single operation is very useful
> when cleaning up data for presentation and analysis.  

but this is really *hardly* r-specific.  you can do that in many, many
languages, be assured.  just peek out.

> Also point out how easy it is to get data *out* of R, as above, not
> just into it, so you can then hack on it in Python, if that's the
> better language for further manipulation.
>
> If she gives you static about how a few more lines are no big deal,
> remind her that it's well established that bug count is always a
> simple function of line count.  This fact has been known since the 70's.

that's a slogan, esp. when you think of how compact (but unreadable, and
thus error-prone) can code written in perl be.  often, more lines of
code make it easier to maintain, and thus avoid bugs.


>
> While making your points, remember that she has a good one, too: R is
> not the only good language out there.  You should learn Python while
> she's learning R.

+1




More information about the R-help mailing list