[R-sig-finance] Backtest trading strategies

Steve Miller steve.miller at jhu.edu
Mon Nov 28 15:27:21 CET 2005

My company delivers broad business intelligence solutions on foundations of
data warehouses and marts that grow to hundreds of gigabytes. It is
therefore critical that we optimize data storage and the ETL -- extract,
transform, and load -- processes for very large data. Our minimal open
source analytics solution consists of: MySql/Postgres (or Oracle, et. al. if
standard) repositories that house the sourcing data, assuring one version of
the truth; Perl/Python for "munging" potentially very large, complicated,
and messy input files, consolidating data across disparate sources,
performing array and date calculations on millions of records,
inserting/updating the databases, reporting, and sourcing data sets for
input to analytics; and R for graphics, statistics, and analytical
calculations. While R is a very powerful and rich language, we do not see
its strength in parsing ugly and error-prone data files, nor do we find its
interpreted speed adequate for very large data management. Once the data
stores are built and validated, we turn them over to users of R and other BI
tools such as Pentaho. We are very encouraged by both the acceptance of this
approach and R that we're starting to experience in the commercial world.

Steve Miller 

-----Original Message-----
From: r-sig-finance-bounces at stat.math.ethz.ch
[mailto:r-sig-finance-bounces at stat.math.ethz.ch] On Behalf Of Gabor
Sent: Saturday, November 26, 2005 7:14 AM
To: paul sorenson
Cc: r-sig-finance at stat.math.ethz.ch
Subject: Re: [R-sig-finance] Backtest trading strategies

On 11/26/05, paul sorenson <sourceforge at metrak.com> wrote:
> Gabor Grothendieck wrote:
> > On 11/26/05, Rob Steele <rfin.20.phftt at xoxy.net> wrote:
> >
> >>Neuro LeSuperHéros wrote:
> >>
> >>>Hello,
> >>>
> >>>I understand the utility of MySQL for data storage.  But why is Python
> >>>essential?  What does it do that R can't do for system
> >>>creation/calculation?
> >>>
> >>>Thanks
> >>
> >>
> >>Python is great for parsing data from wherever you get it and populating
> >>databases.  MySQL is ideal for the write-once-read-thereafter scenario
> >>that research implies.  You can use R for the initial data marshaling
> >>if you'd rather not learn another language but Python seems like a
> >>better fit for that sort of thing.  It's a scripting language that
> >>integrates more naturally into its host environment.  For analysis and
> >>visualization however, R absolutely rules.
> >
> >
> > I don't use MySQL so won't comment on that part but for parsing
> > data I have found R to have everything I need.  I used to use perl
> > but now use R exclusively.    R's string manipulation includes
> > regular expressions and the vector processing often simplifies
> > string manipulation by eliminating loops over lines or vectors
> > of strings.
> >
> > To me its much easier to maintain code if its all in one language and
> > moving to R has enabled me to replace a bunch of perl, batch files
> > and other statistical software with R which really helps clean it
> > all up.  (Actually I still have some Windows batch files, see
> > http://cran.r-project.org/contrib/extra/batchfiles/, but they are only
> > for generic configuration utilities and nothing specific to any
> Each to their own I guess.  I happen to be much more familiar with
> Python than R and often use it to grab data in various formats which R
> won't read.  I wouldn't dream of using an MSDOS batch file.  As I learn
> more about R, I tend to do more in it but I couldn't imagine myself
> parsing dodgy HTML, for example, with it.

Actually I use R for parsing HTML and for parsing XML too.  I do
agree by Rob that it would be nice if R worked better with shells
and also wish I could write small self contained executables in R
like one can in tcl and Python.

R-sig-finance at stat.math.ethz.ch mailing list

More information about the R-sig-finance mailing list