[R-SIG-Finance] R Memory Usage

Fri Apr 15 17:24:51 CEST 2011

Another option is data.table :
http://datatable.r-forge.r-project.org/

"Brian G. Peterson" <brian at braverock.com> wrote in message 
news:4DA85806.3050700 at braverock.com...
> Since we seem to be top-posting for this thread, I'll continue (ick).
>
> Typically, we store daily, minute, or tick data as xts objects, one per 
> instrument.  Much as you would get from getSymbols in quantmod, for 
> comparison.  (we've written getSymbols methods that are in the 
> FinancialInstrument package that are more amenable to disk-based 
> persistent storage of tick data).
>
> When necessary, we subset, align, and cbind this data into combined xts 
> objects to get multi-column xts objects.  About the only thing it is 
> convenient to do with a data.frame that is inconvenient in xts is data 
> which contains mixed numeric and text data.  I typically still use xts for 
> these if they will be large objects, but you then have to be aware that 
> (like with a matrix) all your numeric data will be stored as character 
> data, and you'll need to use as.numeric.  If you only have numeric data, 
> this proviso does not apply.  You'll find that merge, cbind, and rbind on 
> xts are massively more efficient than the data.frame equivalents.
>
> With your example of 800 stocks, I would likely store each stock as a 
> separate xts object, and subset and bind as necessary for your analysis. 
> Perhaps in an environment, to keep from cluttering the .GlobalEnv, and 
> make it easier to save/load all your data at once.
>
> We avoid data.frame for any object that doesn't absolutely require the 
> mixed types and factor support of data.frame.  It's too inefficient in 
> memory and speed for truly large data.
>
> Regards,
>
>   - Brian
>
> -- 
> Brian G. Peterson
> http://braverock.com/brian/
> Ph: 773-459-4973
> IM: bgpbraverock
>
>
> On 04/15/2011 09:19 AM, Elliot Joel Bernstein wrote:
>> Jeff -
>>
>> Thanks for your feedback. I was attempting to use data frame, and that --  
>> specifically the use of the 'merge' function -- seemed to be the root of 
>> the problem. I read the xts vignette, and it looks interesting, but it's 
>> not clear how I should use it for my data. The example in the vignette 
>> (using 'sample_matrix') seems to have several variables ('Open', 'Close', 
>> etc.) measured over time for a single stock. How would you handle 
>> multiple variables measured on multiple stocks over time? Ideally I think 
>> I would like to have multiple matrices contained in the xts object, one 
>> for each variable, with rows indexing time and columns indexing stocks 
>> (or a 3-D array, with the third dimension indexing the variable).
>>
>> Thanks.
>>
>> - Elliot
>>
>> On Sun, Apr 10, 2011 at 02:14:42PM -0500, Jeffrey Ryan wrote:
>>> Elliot,
>>>
>>> One of the advantages to posting to the finance list is that those of
>>> us who work around large data in finance can comment on tools that you
>>> use as well.
>>>
>>> One thing you didn't mention specifically was which packages you are
>>> using and maybe examples of specific code you are calling.
>>>
>>> Within financial time-series, one of the most optimized tools is xts -
>>> precisely for the reason of memory management and optimizations for
>>> large data.  Using something ad-hoc, for example strings and
>>> data.frames - would cause tremendous issues.
>>>
>>> Another issue would be whether or not you need the full data resident
>>> in memory at all times.  R's rds format, or a database, or use of
>>> out-of-core objects such as with mmap or indexing - can greatly
>>> improve things.
>>>
>>> If you are able to come to the R/Finance conference in Chicago on the
>>> 29th and 30th of this month, you'll have a chance to talk to some of
>>> those 'in the trenches' with respect to using R on big data.  And as
>>> you point our (as well as Brian) - 800x3000 isn't very large, so your
>>> case isn't unique.
>>>
>>> Would be great to see you later this month in Chicago! 
>>> www.RinFinance.com
>>>
>>> Best,
>>> Jeff
>>>
>>>
>>>
>>> On Sun, Apr 10, 2011 at 10:49 AM, Elliot Joel Bernstein
>>> <elliot.bernstein at fdopartners.com>  wrote:
>>>> This is not specifically a finance question, but I'm working with 
>>>> financial
>>>> data (daily stock returns), and I suspect many people using R for 
>>>> financial
>>>> analysis face similar issues. The basic problem I'm having is that with 
>>>> a
>>>> moderately large data set (800 stocks x 11 years), performing a few
>>>> operations such as data transformations, fitting regressions, etc., 
>>>> results
>>>> in R using an enormous amount of memory -- sometimes upwards of 5GB --  
>>>> even
>>>> after using gc() to try and free some memory up. I've read several 
>>>> posts to
>>>> various R mailing lists over the years indicating that R does not 
>>>> release
>>>> memory back to the system on certain OSs (64 bit Linux in my case), so 
>>>> I
>>>> understand that this is "normal" behavior for R. How do people 
>>>> typically
>>>> work around this to do exploratory analysis on large data sets without
>>>> having to constantly restart R to free up memory?
>>>>
>>>> Thanks.
>>>>
>>>> - Elliot Joel Bernstein
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-SIG-Finance at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>>> -- Also note that this is not the r-help list where general R questions 
>>>> should go.
>>>>
>>>
>>>
>>>
>>> --
>>> Jeffrey Ryan
>>> jeffrey.ryan at lemnica.com
>>>
>>> www.lemnica.com
>>>
>>> R/Finance 2011 April 29th and 30th in Chicago | www.RinFinance.com
>>
>