[R-SIG-Finance] Preprocessing RData file (Was: Kdb (Was: high frequency data analysis in R))

Rowe, Brian Lee Yung (Portfolio Analytics) B_Rowe at ml.com
Thu May 21 20:10:47 CEST 2009


... and possibly even a list change.

Do you plan on making this compatible with ff or bigmemory? Seems like this theme is making its rounds.

Brian 

-----Original Message-----
From: Jeff Ryan [mailto:jeff.a.ryan at gmail.com] 
Sent: Thursday, May 21, 2009 1:58 PM
To: Rowe, Brian Lee Yung (Portfolio Analytics)
Cc: Dirk Eddelbuettel; Hae Kyung Im; r-sig-finance at stat.math.ethz.ch
Subject: Re: [R-SIG-Finance] Preprocessing RData file (Was: Kdb (Was: high frequency data analysis in R))


I feel like I should change the title again... :)

The RData files are compressed first off. If you don't want the gzip
overhead, get rid of it.

The xts format 'on-disk' is nothing more that the structure from
memory written to disk.  This manages to be both faster and takes up
less space.  It isn't a huge gain, but it allows for binary searching
of the index to get to the data you want.

I will put together a performance comparison at some point, and pass along.

Jeff

On Thu, May 21, 2009 at 12:52 PM, Rowe, Brian Lee Yung (Portfolio
Analytics) <B_Rowe at ml.com> wrote:
> Is there any literature on the relative performance gain of
> preprocessing data into RData and then reading into R? Does it breakdown
> anywhere? I have 4 GB of data that I'm reading in and I/O is a large
> bottleneck.
>
> Brian
>
>
> -----Original Message-----
> From: r-sig-finance-bounces at stat.math.ethz.ch
> [mailto:r-sig-finance-bounces at stat.math.ethz.ch] On Behalf Of Dirk
> Eddelbuettel
> Sent: Thursday, May 21, 2009 1:42 PM
> To: Hae Kyung Im
> Cc: r-sig-finance at stat.math.ethz.ch
> Subject: [R-SIG-Finance] Kdb (Was: high frequency data analysis in R)
>
>
>
> On 21 May 2009 at 11:13, Hae Kyung Im wrote:
> | access (query) this huge database. I looked a little bit into kdb but
> | you have to pay ~25K to buy the software for one processor. I haven't
>
> True, but you can have "free" (as in beer) 32bit version that times out
> after
> two hours. That's not a bad compromise.
>
> I looked at it for a bit, and I has an R interface. (My blog has a patch
> to
> fix their then-broken interface to R's Datetime; I think they may have
> integrated that by now).  Then again you can also pre-process into RData
> files, or use hdf5, or use a gazillion other methods.   But the free
> trial
> version may just help for the odd research project like the one Haky
> described.
>
> Dirk
>
> --
> Three out of two people have difficulties with fractions.
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>
> --------------------------------------------------------------------------
> This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. References to "Merrill Lynch" are references to any company in the Merrill Lynch & Co., Inc. group of companies, which are wholly-owned by Bank of America Corporation. Secu!
>  rities and Insurance Products: * Are Not FDIC Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a Condition to Any Banking Service or Activity * Are Not Insured by Any Federal Government Agency. Attachments that are part of this E-communication may have additional important disclosures and disclaimers, which you should read. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing.
> --------------------------------------------------------------------------
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>



-- 
Jeffrey Ryan
jeffrey.ryan at insightalgo.com

ia: insight algorithmics
www.insightalgo.com



More information about the R-SIG-Finance mailing list