[R-SIG-Finance] Dealing with large dataset in quantmod

Brian G. Peterson brian at braverock.com
Thu Jan 12 20:14:00 CET 2012


On Thu, 2012-01-12 at 19:46 +0100, Gabriele Vivinetto [Public address]
wrote:
> Hello to the mailing list.
> I'm a newbie in R, and this is my first post.
> I've evaluated quantmod using EOD data from yahoo, and everything went fine.
> I have a mysql database containing tick by tick data (in a format
> suitable for quantmod getSymbols.MySQL) and I have tried to use these data.
> Using a table with a small subset of the data (1000 rows) there is no
> problem.
> But if I try to use a table with all the record (~6 millions rows), R is
> very slow and memory hungry (simply speaking it crashes all the times
> after loading the data...).
> As a workaround I've modified the getSymbols.MySQL R function to accept
> from= and to= parameters, so the sql SELECT gives to R a desired subset
> of data, but using more than 100k records is a pain.
> Someone has a workaround or suggestions for using large datasets with
> quantmod ?
> 
> Thank you !

I routinely use xts on tick data series with tens or hundreds of
millions of rows.  I also have a lot of RAM (16-48GB) per machine. 

Some things that will affect how much ram the xts object uses are the
number of columns in your data, and whether you are using a numeric or
character xts object.  

We just ran a little test here and 17.5M rows on one column take about a
third of a GB of RAM.

In a 32 bit environment, R is limited to 3GB of RAM, so this may be part
of your problem.

Last, you don't say what functions you are calling which respond slowly.

Regards,

   - Brian

-- 
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock



More information about the R-SIG-Finance mailing list