[R-SIG-Finance] R Memory Usage

Elliot Joel Bernstein elliot.bernstein at fdopartners.com
Fri Apr 15 16:19:09 CEST 2011


Jeff -

Thanks for your feedback. I was attempting to use data frame, and that -- specifically the use of the 'merge' function -- seemed to be the root of the problem. I read the xts vignette, and it looks interesting, but it's not clear how I should use it for my data. The example in the vignette (using 'sample_matrix') seems to have several variables ('Open', 'Close', etc.) measured over time for a single stock. How would you handle multiple variables measured on multiple stocks over time? Ideally I think I would like to have multiple matrices contained in the xts object, one for each variable, with rows indexing time and columns indexing stocks (or a 3-D array, with the third dimension indexing the variable).

Thanks.

- Elliot

On Sun, Apr 10, 2011 at 02:14:42PM -0500, Jeffrey Ryan wrote:
> Elliot,
> 
> One of the advantages to posting to the finance list is that those of
> us who work around large data in finance can comment on tools that you
> use as well.
> 
> One thing you didn't mention specifically was which packages you are
> using and maybe examples of specific code you are calling.
> 
> Within financial time-series, one of the most optimized tools is xts -
> precisely for the reason of memory management and optimizations for
> large data.  Using something ad-hoc, for example strings and
> data.frames - would cause tremendous issues.
> 
> Another issue would be whether or not you need the full data resident
> in memory at all times.  R's rds format, or a database, or use of
> out-of-core objects such as with mmap or indexing - can greatly
> improve things.
> 
> If you are able to come to the R/Finance conference in Chicago on the
> 29th and 30th of this month, you'll have a chance to talk to some of
> those 'in the trenches' with respect to using R on big data.  And as
> you point our (as well as Brian) - 800x3000 isn't very large, so your
> case isn't unique.
> 
> Would be great to see you later this month in Chicago!  www.RinFinance.com
> 
> Best,
> Jeff
> 
> 
> 
> On Sun, Apr 10, 2011 at 10:49 AM, Elliot Joel Bernstein
> <elliot.bernstein at fdopartners.com> wrote:
> > This is not specifically a finance question, but I'm working with financial
> > data (daily stock returns), and I suspect many people using R for financial
> > analysis face similar issues. The basic problem I'm having is that with a
> > moderately large data set (800 stocks x 11 years), performing a few
> > operations such as data transformations, fitting regressions, etc., results
> > in R using an enormous amount of memory -- sometimes upwards of 5GB -- even
> > after using gc() to try and free some memory up. I've read several posts to
> > various R mailing lists over the years indicating that R does not release
> > memory back to the system on certain OSs (64 bit Linux in my case), so I
> > understand that this is "normal" behavior for R. How do people typically
> > work around this to do exploratory analysis on large data sets without
> > having to constantly restart R to free up memory?
> >
> > Thanks.
> >
> > - Elliot Joel Bernstein
> >
> >        [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-SIG-Finance at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> > -- Subscriber-posting only. If you want to post, subscribe first.
> > -- Also note that this is not the r-help list where general R questions should go.
> >
> 
> 
> 
> -- 
> Jeffrey Ryan
> jeffrey.ryan at lemnica.com
> 
> www.lemnica.com
> 
> R/Finance 2011 April 29th and 30th in Chicago | www.RinFinance.com

-- 
Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
134 Mount Auburn Street | Cambridge, MA | 02138
Phone: (617) 503-4619 | Email: elliot.bernstein at fdopartners.com



More information about the R-SIG-Finance mailing list