[R-SIG-Finance] R + HDF5 + Pytables

Daniel Cegiełka daniel.cegielka at gmail.com
Tue May 18 16:23:36 CEST 2010


Manoj, this is not a financial subject  - you should send this to
r-sig-hpc list.

> Hopefully we could do a comparision/benchmarking of few different
> alternatives (including commercial tools like kdb).

Now indexing is still under development, but ability to work with high
performance with TB of tick data it was one of primary design goal of
indexing package. Inside xts code you can find nice optimized C code
for low latency and high performance. And when you join xts with
indexing package you can compare it even with kdb... (next point - you
can use indexing as a shared memory for many R instances).

Indexing will work nice event with many TB of tick data and you don't
have latency from TCP stack (kdb).

It need(?) only some nice compression solution...

regards,
daniel


W dniu 18 maja 2010 05:56 użytkownik Manoj <manojsw at gmail.com> napisał:
> Daniel - that's interesting feedback.
>
> Jeff: I did a quick search on indexing packages and it seems its still
> in development stages - looks very promising thou. I am more than
> happy to test it out and give feedback/suggestions.
>
> Hopefully we could do a comparision/benchmarking of few different
> alternatives (including commercial tools like kdb).
>
> Manoj
>
> 2010/5/18 Daniel Cegiełka <daniel.cegielka at gmail.com>:
>> Hi Monoj
>> I tested hdf5 with R and in my opinion there is no sense to use it
>> with xts/zoo for tick data.
>> If you will work with R, then much better is to store xts objects (or
>> R objects) directly on the disk (it's simpler, faster and better way).
>>
>> Check (Jeff Ryan) packages:
>> RBerkeley: https://r-forge.r-project.org/projects/rberkeley/
>> indexing: http://r-forge.r-project.org/projects/indexing/
>>
>> example for RBerkeley:
>>
>> bdb <- db_create()
>> db_open(bdb,file='blotter.db')   # load db_file from disc
>>
>> # and some quary
>> unserialize(db_get(dbh,key='GOOG'))['2010-02-17::2010-02-25',4])
>>
>>
>> If you need ultra fast solution, you must try Jeff's indexing package ;)
>>
>> regards,
>> daniel
>>
>>
>>
>>
>> 2010/5/17 Manoj <manojsw at gmail.com>
>>>
>>> Dear All,
>>>       I have created a HDF5 file using Python + Pytables. The HDF5
>>> file stores tick-data and as such is quite huge in size. I am planning
>>> to use R/zoo/xts combination for analytics. The tricky bit is that I
>>> am unable to find a good wrapper to access/query the HDF5 created by
>>> Pytables (keeping intact all the nice features such as indices etc of
>>> HDF5 file) .  The hdf5 library in R wouldn't help given the size of
>>> the file.
>>>
>>>      One (crude) option is to query data using Python/Pytables, write
>>> to an output file and invoke R for analytics. The question is - could
>>> this task be done in a more efficient fashion? Is there a good
>>> HDF5/Pytables wrapper that could help me do the task completely within
>>> R?
>>>
>>>     Any tips/suggestions would be greatly appreciated.
>>>
>>> Thanks.
>>>
>>> Manoj
>>>
>>> _______________________________________________
>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>> -- Also note that this is not the r-help list where general R questions should go.
>>
>



More information about the R-SIG-Finance mailing list