[R-SIG-Finance] R-SIG-Finance Digest, Vol 40, Issue 16

icosa atropa icos.atropa at gmail.com
Sun Sep 23 02:45:23 CEST 2007


I settled on Postgresql to store my timeseries prior to R processing,
and I've been very pleased with my choice.  It has extensive datatype
support - In particular, I've used POSIX , float, text, and even GIS
extensions with great success.  Postgresl documentation is __very__
good.  Admin requires some time and resources (not VPS friendly by
default), but it's been a very satisfying learning experience for me.

I consider postgresql the "designers" choice of database to interface
with R.  It supports cool features like foreign keys and views that I
now take for granted.  As previously mentioned, server-side use of
perl and/or python functions defined and triggered within the database
can be very useful. Overall, if the product is to be used long-term,
by more than one person, or is developed by someone who enjoys
programming, then I consider Postgreql a very good choice.

I use RODBC to connect R to postgresql (extra datatype coersion can
occur here).  The unixODBC docs (dongle between RODBC and postgresql)
are decent and generally complete.  I preprocess my data with Perl -
the perl database interface docs are high-quality; i imagine python is
similar.

Some interesting advantages of postgresql (also apply to mysql?) that
I've found:
I can easily interface with different front ends
  - php and html
  - Openoffice base
  - psql, etc

A final note - Relational databases don't "understand" timeseries per
se, and are not guaranteed to return rows in a given order unless
specifically requested.  Adding a "unique" constraint on the time
index column and sorting on the time index gives the table the
appearance of being a timeseries.  As far as the database is
concerned, however, its just a heap of tuples.   Hence the importance
of pre-processing text data with a text-friendly language like python
or perl before db insertion.

hope this helps,
christian

> paul sorenson wrote:
> > I am collecting hundreds of time series and storing them in sqlite.  I
> > am downloading and storing with python then reading them out with R.
>
> Just for the archive:  yes MySQL, PostgreSQL, Oracle, DB2, etc can store
> data types better than sqlite.  As Paul notes, the main reason to use
> sqlite is that you don't need to actually administer a real database
> environment.
>
>    - Brian



More information about the R-SIG-Finance mailing list