[R] R/S and large datasets - Database access (also Re: SAS and S/R)
Emmanuel Charpentier
charpent at bacbuc.dyndns.org
Tue Nov 27 16:11:47 CET 2001
A consensus seems to emerge : R would excel to exploratory work on
small/middle-sized datasets, while SAS would be able to munch much
larger datasets.
However, I see the "size" problem as a red herring. The objects that
have to stay "in core" are usually much smaller than the dataset. For
example, for problems involving fixed-effects linear models, you need
only some matrices whose size is proportional to the square of the
number of *variables* and the (admittedly large) vector of residues
(whose size is equl to the number of observations). Other cases
(nonlinear mixed effects models come to mind) are not as easily tamed
(any iterative process (shuch as ML estimation) has to get back to
original data), but at least, the time penalty involved in the use of
such an interface pays back by allowing you to treat problems otherwise
untractable.
I am aware of at least one database access package that allows to access
data without dragging a whole table in memory : the RPgSql package
offers what it calls a "proxy variable", which is an objet that behaves,
for all practical purposes, as a dataframe, but is an interface to
database tables. I see this kind of interface as a way to avoid
overloading core memory with data scarcely used.
Unfortunately, the said package is now officially orphaned by its
developper, which states that he now focuses on the next database access
standard : the Rdbi interface, which is currently under development, and
which I don't know a thing about.
So the question is : do the Rdbi interface offers such a proxy to data
still residing in databases ?
Or am I barking up the wrong tree and trying to (re-)invent an
oversophisticated virtual memory manager ? SShould the use of a
suficiently large swapfile be enough for these "large dataset" problems ?
--
Emmanuel Charpentier
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list