[R] managing large datasets with RMySQL
Tamas K Papp
tpapp at Princeton.EDU
Tue Aug 9 18:13:52 CEST 2005
I have a large dataset (about 1 million data points from a
68-dimensional state space, result of an MCMC simulation) which won't
fit in memory. I think that the only solution for analyzing this is
saving it in relational database (when generated) and then reading
back only portions of this data.
I have installed & initialized MySQL and the RMySQL package (I know
nothing about SQL, unfortunately, but I will try to learn). The code
from section 4.3.1 of the R Data Import/Export manual runs
successfully.
Questions:
1. should I use dbWriteTable(..., overwrite=FALSE, append=TRUE) for
repeatedly saving the chunks of data?
2. is it OK to make row.names=FALSE when writing?
3. how do I retrieve only parts of the data? dbReadTable returns the
whole thing if I understand correctly.
If somebody has written code for analyzing data in parts before, I
would appreciate if he could send it.
Thanks,
Tamas
More information about the R-help
mailing list