[R] R usage for log analysis
Allen S. Rout
asr at ufl.edu
Mon Jun 12 18:06:36 CEST 2006
"Gabriel Diaz" <gabidiaz at gmail.com> writes:
> I'm taking an overview to the project documentation, and seems the
> database is the way to go to handle log files of GB order (normally
> between 2 and 4 GB each 15 day dump).
> In this document http://cran.r-project.org/doc/manuals/R-data.html,
> says R will load all data into memory to process it when using
> read.table and such. Using a database will do the same? Well,
> currently i have no machine with > 2 GB of memory.
Remember, swap too. This means you're using more time, not running
into a hard limit.
If you're concerned about gross size, then preprocessing could be
useful; but consider: RAM is cheap. Calibrate RAM purchases
w.r.t. hours of your coding time, -before- you start the project.
Then you can at least mutter to yourself when you waste more than the
cost of core trying to make the problem small. :)
It's entirely reasonable to do all your development work on a smaller
set, and then dump the real data into it and go home. Unless you've
got something O(N^2) or so, you should be fine.
- Allen S. Rout
More information about the R-help
mailing list