[R-pkg-devel] Logging data to disk from C++ using a DBMS

Dirk Eddelbuettel edd at debian.org
Sat Jun 3 18:30:25 CEST 2017


On 3 June 2017 at 17:45, Iñaki Úcar wrote:
| We have a simulator, 'simmer', implemented as a C++ object (with Rcpp).
| This simulator generates a bunch of data, and there are some functions
| defined to bring this data to R space as data frames. The memory usage can
| be a problem depending on the simulation model, so we would like to add
| support for logging data directly to disk while being as transparent as
| possible from the user perspective.
| 
| Our idea is to define a reusable chunk size and periodically append the
| chunk to a DBIConnection by calling DBI::dbWriteTable from C++. In this
| way, the subsequent analysis may be performed with dplyr no matter which
| storage method is selected, memory or database.
| 
| The question is whether there is a better, more efficient, way of achieving
| this goal. Is there any package already doing something similar? Ideas and
| alternatives welcome. Thanks in advance.

I would do it differently, at least to start. Define a structure, either
plain in C/C++ or, preferably, with something portable like MgsPack (see
RcppMsgPack, which is header only; MsgPack can be deserialized easily by just
about every language on the planet) and have your C++ layer write it.  Then
when you're done, convert once, into DB or whatever else you want.

You can do it all at once and have your simulation hand off data to a logger
in a separate thread. But its more work, harder to debug and probably not
really needed.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org



More information about the R-package-devel mailing list