[R-pkg-devel] Advice on in-RAM out of RAM (MonetDB) in data import package

Lucas Ferreira Mation lucasmation at gmail.com
Mon Jul 11 16:32:46 CEST 2016


Just a clarification, I am using MonetDBLite for this.

2016-07-11 11:28 GMT-03:00 Lucas Ferreira Mation <lucasmation at gmail.com>:

> I am writing a package that imports most of the Brazillian socio-economic
> micro datasets.
> (microdadosBrasil <https://github.com/lucasmation/microdadosBrasil>). The
> idea of the package that the data import is very simple, so even users with
> verry little R programming knowledge can use the data easily.
> Although I would like to have decent performance, the first concern is
> usability.
>
> The package imports data to an in memory data.table  object.
> I am now trying to implement support for out of memory datasets using
> MonetDBLite.
>
> Is there a (non OS dependent) way to predict if a dataset will fit into
> memory or not? Ideally the package would ask the computer for the maximum
> amount of RAM that R can use. The package would then default to
> MonetDBLite if the available RAM was smaller then 3x the in memory size
> of the dataset.
>
> There will also be an argument for the user to choose himself wether to
> use in RAM or out of RAM, but if that argument is not provided the package
> would choose for him.
>
> In any case, does that seems reasonable? Or should I force the user to be
> aware of this?
>
> Another option would be to default to MonetDB (unless the user explicitly
> asks for in-memory data). Is MonetDB performance so good that it would
> not make much of a difference?
>
> Another disadvantage of the MonetDB default is that the user will not be
> able to run base-R data manipulation commands. So he will have to use dplyr
> (which is great and simple) or SQL queries (which few people will know).
>
> reagards
> Lucas
>

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list