[R-SIG-Finance] R Memory Usage
mdowle at mdowle.plus.com
Fri Apr 15 17:24:51 CEST 2011
Another option is data.table :
"Brian G. Peterson" <brian at braverock.com> wrote in message
news:4DA85806.3050700 at braverock.com...
> Since we seem to be top-posting for this thread, I'll continue (ick).
> Typically, we store daily, minute, or tick data as xts objects, one per
> instrument. Much as you would get from getSymbols in quantmod, for
> comparison. (we've written getSymbols methods that are in the
> FinancialInstrument package that are more amenable to disk-based
> persistent storage of tick data).
> When necessary, we subset, align, and cbind this data into combined xts
> objects to get multi-column xts objects. About the only thing it is
> convenient to do with a data.frame that is inconvenient in xts is data
> which contains mixed numeric and text data. I typically still use xts for
> these if they will be large objects, but you then have to be aware that
> (like with a matrix) all your numeric data will be stored as character
> data, and you'll need to use as.numeric. If you only have numeric data,
> this proviso does not apply. You'll find that merge, cbind, and rbind on
> xts are massively more efficient than the data.frame equivalents.
> With your example of 800 stocks, I would likely store each stock as a
> separate xts object, and subset and bind as necessary for your analysis.
> Perhaps in an environment, to keep from cluttering the .GlobalEnv, and
> make it easier to save/load all your data at once.
> We avoid data.frame for any object that doesn't absolutely require the
> mixed types and factor support of data.frame. It's too inefficient in
> memory and speed for truly large data.
> - Brian
> Brian G. Peterson
> Ph: 773-459-4973
> IM: bgpbraverock
> On 04/15/2011 09:19 AM, Elliot Joel Bernstein wrote:
>> Jeff -
>> Thanks for your feedback. I was attempting to use data frame, and that --
>> specifically the use of the 'merge' function -- seemed to be the root of
>> the problem. I read the xts vignette, and it looks interesting, but it's
>> not clear how I should use it for my data. The example in the vignette
>> (using 'sample_matrix') seems to have several variables ('Open', 'Close',
>> etc.) measured over time for a single stock. How would you handle
>> multiple variables measured on multiple stocks over time? Ideally I think
>> I would like to have multiple matrices contained in the xts object, one
>> for each variable, with rows indexing time and columns indexing stocks
>> (or a 3-D array, with the third dimension indexing the variable).
>> - Elliot
>> On Sun, Apr 10, 2011 at 02:14:42PM -0500, Jeffrey Ryan wrote:
>>> One of the advantages to posting to the finance list is that those of
>>> us who work around large data in finance can comment on tools that you
>>> use as well.
>>> One thing you didn't mention specifically was which packages you are
>>> using and maybe examples of specific code you are calling.
>>> Within financial time-series, one of the most optimized tools is xts -
>>> precisely for the reason of memory management and optimizations for
>>> large data. Using something ad-hoc, for example strings and
>>> data.frames - would cause tremendous issues.
>>> Another issue would be whether or not you need the full data resident
>>> in memory at all times. R's rds format, or a database, or use of
>>> out-of-core objects such as with mmap or indexing - can greatly
>>> improve things.
>>> If you are able to come to the R/Finance conference in Chicago on the
>>> 29th and 30th of this month, you'll have a chance to talk to some of
>>> those 'in the trenches' with respect to using R on big data. And as
>>> you point our (as well as Brian) - 800x3000 isn't very large, so your
>>> case isn't unique.
>>> Would be great to see you later this month in Chicago!
>>> On Sun, Apr 10, 2011 at 10:49 AM, Elliot Joel Bernstein
>>> <elliot.bernstein at fdopartners.com> wrote:
>>>> This is not specifically a finance question, but I'm working with
>>>> data (daily stock returns), and I suspect many people using R for
>>>> analysis face similar issues. The basic problem I'm having is that with
>>>> moderately large data set (800 stocks x 11 years), performing a few
>>>> operations such as data transformations, fitting regressions, etc.,
>>>> in R using an enormous amount of memory -- sometimes upwards of 5GB --
>>>> after using gc() to try and free some memory up. I've read several
>>>> posts to
>>>> various R mailing lists over the years indicating that R does not
>>>> memory back to the system on certain OSs (64 bit Linux in my case), so
>>>> understand that this is "normal" behavior for R. How do people
>>>> work around this to do exploratory analysis on large data sets without
>>>> having to constantly restart R to free up memory?
>>>> - Elliot Joel Bernstein
>>>> [[alternative HTML version deleted]]
>>>> R-SIG-Finance at r-project.org mailing list
>>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>>> -- Also note that this is not the r-help list where general R questions
>>>> should go.
>>> Jeffrey Ryan
>>> jeffrey.ryan at lemnica.com
>>> R/Finance 2011 April 29th and 30th in Chicago | www.RinFinance.com
More information about the R-SIG-Finance