[Rd] stopping finalizers

Hadley Wickham h.wickham at gmail.com
Sat Feb 16 02:32:41 CET 2013


> The subset table isn't a copy of the subset, it contains the unique key and
> an indicator column showing whether the element is in the subset.  I need
> this even if the subset is never modified, so that I can join it to the main
> table and use it in SQL 'where' conditions to get computations for the right
> subset of the data.

Cool - Is that faster than storing a column that just contains the
include indices?

>  The whole point of this new sqlsurvey package is that most of the
> aggregation operations happen in the database rather than in R, which is
> faster for very large data tables.  The use case is things like the American
> Community Survey and the Nationwide Emergency Department Subsample, with
> millions or tens of millions of records and quite a lot of variables.  At
> this scale, loading stuff into memory isn't feasible on commodity desktops
> and laptops, and even on computers with enough memory, the database
> (MonetDB) is faster.

Have you done any comparisons of monetdb vs sqlite - I'm interested to
know how much faster it is. I'm working on a package
(https://github.com/hadley/dplyr) that compiles R data manipulation
expressions into (e.g. SQL), and have been wondering if it's worth
considering a column-store like monetdb.

Hadley

-- 
Chief Scientist, RStudio
http://had.co.nz/



More information about the R-devel mailing list