[R] Reasons to Use R

Thomas Lumley tlumley at u.washington.edu
Thu Apr 12 00:40:29 CEST 2007


On Wed, 11 Apr 2007, Alan Zaslavsky wrote:
> I have thought for a long time that a facility for efficient rowwise
> calculations might be a valuable enhancement to S/R.  The storage of the
> object would be handled by a database and there would have to be an
> efficient interface for pulling a row (or small chunk of rows) out of the
> database repeatedly; alternatively the operatons could be conducted inside
> the database.  Basic operations of rowwise calculation and cumulation
> (such as forming a column sum or a sum of outer-products) would be
> written in an R-like syntax and translated into an efficient set of
> operations that work through the database.  (Would be happy to share
> some jejeune notes on this.)  However the main answer to thie problem
> in the R world seems to have been Moore's Law.  Perhaps somebody could
> tell us more about the S-Plus large objects library, or the work that
> Doug Bates is doing on efficient calculations with large datasets.
>


I have been surprised to find how much you can get done in SQL, only transferring summaries of the data into R.  There is soon going to be an 
experimental "surveyNG" package that works with survey data stored in a SQLite database without transferring the whole thing into R for most operations (and I 
could get further if SQLite had the log() and exp() functions that most other SQL implementations for large databases provide). I'll be submitting a paper on 
this to useR2007.

The approach of transferring blocks of data into R and using a database just as backing store will allow more general computation but will be less efficient 
than performing the computation in the database, so a mixture of both is likely to be helpful.  Moore's Law will settle some issues, but there are problems where it is working to increase the size of datasets just as fast as it 
increases computational power.


     -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-help mailing list