[R-sig-DB] how to deal with big database in R

Wed Sep 17 18:34:09 CEST 2008

Hi,

For the first time, I really face the problem of large data sets in R and I
need some help/advice on this subject.

I have large datasets on an Oracle server (at least one million records) and
I have the following questions:
- is the package DBI very efficient when dealing with Oracle? Does the
efficiency of the system depends only on oracle and not on R?
- is it easy to deal with multiple tables from R? and to cross tabulate
those big tables into few ones that can be then used in R ? (typically for a
GLM analysis)
- is it possible to have concurrent access to a database from different R
sessions? does it rely on entirely on Oracle functionnalities?
- when a SQL request is made in R, is there something else than the request
himself and the result allocating in R ? I mean, is all the data treatment
the job of the Oracle server?
how to solve the issue if after cross tabulate (and the size decrease of
datasets), the result is still too big to work with R? Do we need to use the
'biglm' package?

Some questions will probably be obvious for this mailing list, but I would
like to have a clear point of view.

Thanks in advance

Kind regards

Christophe Dutang

	[[alternative HTML version deleted]]