[R] Do you use R for data manipulation?

Emmanuel Charpentier charpent at bacbuc.dyndns.org
Wed May 6 09:52:14 CEST 2009


Le mercredi 06 mai 2009 à 00:22 -0400, Farrel Buchinsky a écrit :
> Is R an appropriate tool for data manipulation and data reshaping and data
> organizing? 
[ Large Snip ! ... ]

Depends on what you have to do.

I've done what can be more or less termed "data management" with almost
uncountable tools (from Excel (sigh...) to R with SQL, APL, Pascal, C,
Basic (in 1982 !), Fortran and even Lisp in passing...).

SQL has strong points : join is, to my tastes, more easily expressed in
SQL than in most languages, projection and aggregation are natural.

However, in SQL, there is no "natural" ordering of row tables, which
makes expressing algorithms using this order difficult. Try for example
to express the differences of a time series ... (it can be done, but it
is *not* a pretty sight).

On the other hand, R has some unique expressive possibilities (reshape()
comes to mind).

So I tend to use a combination of tools : except for very small samples,
I tend to manage my data in SQL and with associated tools (think data
editing, for example ; a simple form in OpenOffice's Base is quite easy
to create, can handle anything for which an ODBC driver exists, and
won't crap out for more than a few hundreds line...). finer manipulation
is usually done in R with  native tools and sqldf.

But, at least in my trade, the ability to handle Excel files is a must
(this is considered as a standard for data entry. Sigh ...). So the
first task is usually a) import data in an SQL database, and b) prepare
some routines to dump SQL tables / R dataframes in Excel tor returning
back to the original data author...

HTH

					Emmanuel Charpentier




More information about the R-help mailing list