[R-sig-finance] R vs. S-PLUS vs. SAS

Sat Dec 4 13:15:40 CET 2004

On Fri, Dec 03, 2004 at 06:37:15PM +0000, Patrick Burns wrote:

> There may be some differences between SAS procedures, but
> at least generally SAS does not require the whole data to be in
> RAM.  Regression will take the data row by row and do an update
> for the answer.

Someone might want to ask Joe Conway about his experience and thoughts
integrating R as a procedural language inside PostgreSQL, to create
PL/R:

  http://www.joeconway.com/plr/
  http://gborg.postgresql.org/project/plr/projdisplay.php

(Hm, for good measure, I have Cc'd him on this email.)  Obviously, an
RDBMS like PostgreSQL is expert at dealing with data that doesn't fit
into RAM.  I've no idea whether PL/R does anything special to take
advantage of that, or how feasible it would be to do so.

Does anyone here know much about what makes R dependent on all data
being in RAM, or of links to same?  Is it just some centralized
low-level bits, or do broad swaths of code and algorithms all depend
on the in-RAM assumption?

How do SAS and other such systems avoid that?  Do they do this better
or much more more transparently than what an R user would do now
manually?  Where by "manually", I mean, query some fits-in-RAM amount
data out of an RDBMS (or other such on-disk store), analyze it, delete
the data to free up RAM, and repeat.

Could one say, tie a light-weight high-performance RDBMS library, like
SQLite, into R, and have R use it profitably to scale nicely on data
that does not fit in RAM?  In what way, if any, would this offer a
substantial advantage over current manual R-plus-RDBMS practice?

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com/