[R-SIG-Finance] Preprocessing RData file (Was: Kdb (Was: high frequency data analysis in R))

Jeff Ryan jeff.a.ryan at gmail.com
Thu May 21 20:22:38 CEST 2009


I have given some thought to both ff and bigmemory.  I am not a huge
fan of the "ff" license.

http://cran.r-project.org/web/packages/ff/LICENSE

bigmemory is interesting in that you can bypass the R memory issues on
Windows, but I haven't had incredible luck with it.  Me and C++ don't
like each other, so maybe it is something related to that :).  I can
get around the Windows issues by using something non-windows...

Supposedly the changes to the most recent bigmemory are quite good,
but trying the shared memory route (one of the biggest reasons I would
like to use) has caused me catastrophic failure.

At the end of the day there is no good way to make xts rely on
bigmemory. As so much code in in C for xts, you can't readily operate
on the external pointers from there.  You need to read in the data
(via `[` ) and at that point it is resident to the R process, so you
are only getting the penalty of the memory allocation, and none of the
gain.

Of course this is my 2c.  Maybe we need another time-series library :)

Jeff




On Thu, May 21, 2009 at 1:10 PM, Rowe, Brian Lee Yung (Portfolio
Analytics) <B_Rowe at ml.com> wrote:
> ... and possibly even a list change.
>
> Do you plan on making this compatible with ff or bigmemory? Seems like this theme is making its rounds.
>
> Brian
>
> -----Original Message-----
> From: Jeff Ryan [mailto:jeff.a.ryan at gmail.com]
> Sent: Thursday, May 21, 2009 1:58 PM
> To: Rowe, Brian Lee Yung (Portfolio Analytics)
> Cc: Dirk Eddelbuettel; Hae Kyung Im; r-sig-finance at stat.math.ethz.ch
> Subject: Re: [R-SIG-Finance] Preprocessing RData file (Was: Kdb (Was: high frequency data analysis in R))
>
>
> I feel like I should change the title again... :)
>
> The RData files are compressed first off. If you don't want the gzip
> overhead, get rid of it.
>
> The xts format 'on-disk' is nothing more that the structure from
> memory written to disk.  This manages to be both faster and takes up
> less space.  It isn't a huge gain, but it allows for binary searching
> of the index to get to the data you want.
>
> I will put together a performance comparison at some point, and pass along.
>
> Jeff
>
> On Thu, May 21, 2009 at 12:52 PM, Rowe, Brian Lee Yung (Portfolio
> Analytics) <B_Rowe at ml.com> wrote:
>> Is there any literature on the relative performance gain of
>> preprocessing data into RData and then reading into R? Does it breakdown
>> anywhere? I have 4 GB of data that I'm reading in and I/O is a large
>> bottleneck.
>>
>> Brian
>>
>>
>> -----Original Message-----
>> From: r-sig-finance-bounces at stat.math.ethz.ch
>> [mailto:r-sig-finance-bounces at stat.math.ethz.ch] On Behalf Of Dirk
>> Eddelbuettel
>> Sent: Thursday, May 21, 2009 1:42 PM
>> To: Hae Kyung Im
>> Cc: r-sig-finance at stat.math.ethz.ch
>> Subject: [R-SIG-Finance] Kdb (Was: high frequency data analysis in R)
>>
>>
>>
>> On 21 May 2009 at 11:13, Hae Kyung Im wrote:
>> | access (query) this huge database. I looked a little bit into kdb but
>> | you have to pay ~25K to buy the software for one processor. I haven't
>>
>> True, but you can have "free" (as in beer) 32bit version that times out
>> after
>> two hours. That's not a bad compromise.
>>
>> I looked at it for a bit, and I has an R interface. (My blog has a patch
>> to
>> fix their then-broken interface to R's Datetime; I think they may have
>> integrated that by now).  Then again you can also pre-process into RData
>> files, or use hdf5, or use a gazillion other methods.   But the free
>> trial
>> version may just help for the odd research project like the one Haky
>> described.
>>
>> Dirk
>>
>> --
>> Three out of two people have difficulties with fractions.
>>
>> _______________________________________________
>> R-SIG-Finance at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only.
>> -- If you want to post, subscribe first.
>>
>> --------------------------------------------------------------------------
>> This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. References to "Merrill Lynch" are references to any company in the Merrill Lynch & Co., Inc. group of companies, which are wholly-owned by Bank of America Corporation. Secu!
>>  rities and Insurance Products: * Are Not FDIC Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a Condition to Any Banking Service or Activity * Are Not Insured by Any Federal Government Agency. Attachments that are part of this E-communication may have additional important disclosures and disclaimers, which you should read. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing.
>> --------------------------------------------------------------------------
>>
>> _______________________________________________
>> R-SIG-Finance at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only.
>> -- If you want to post, subscribe first.
>>
>
>
>
> --
> Jeffrey Ryan
> jeffrey.ryan at insightalgo.com
>
> ia: insight algorithmics
> www.insightalgo.com
>



-- 
Jeffrey Ryan
jeffrey.ryan at insightalgo.com

ia: insight algorithmics
www.insightalgo.com



More information about the R-SIG-Finance mailing list