[R-SIG-Finance] Quantstrat - applyRule

Tue Oct 4 16:07:23 CEST 2011

On Tue, 2011-10-04 at 17:25 +1100, Roupell, Darko wrote:
> Hi  Peter Carl, Dirk Eddelbuettel, Brian G. Peterson, Jeffrey Ryan, and Joshua Ulrich,

'Dear quantstrat authors' would have been sufficient ;)

> Great work so far on quantstrat.

Thanks, I'm glad you're finding it useful.

> Saying that I would highly appreciate your input on the issue that 
> I came across using quantstrat.
> 
> Namely, it is related to how to handle powerful combination between 
> quantstrat and  genetic algorithm packages in R (eg GALGO etc), 
> to achieve optimal adaptive trading strategy.

I'll certainly agree that parameter optimization is desirable, but we'll
need to come back to a discussion of how to go about adaptive
parameterization of strategies.

> The problem here is that quantstrat already has some performance issues 
> with multiple assets (as per demo example for RSI and 9 stocks) where it
>  can take up to several minutes for applyStrategy function to execute.
> 
> > end_t<-Sys.time()
> > print("Strategy Loop:")
> [1] "Strategy Loop:"
> > print(end_t-start_t)
> Time difference of 6.861233 mins

The RSI demo is about the worst possible torture test for quantstrat
other than tick data, which obviously I can't distribute due to
contracts with data vendors.

First, of course, your mileage may vary regarding execution time:

On a stock i7 machine running at 3.2Ghz with hyperthreading turned on
(so running considerably though variably slower than the rated 3.2GHz),
I get:

> print("Strategy Loop:")
[1] "Strategy Loop:"

> print(end_t-start_t)
Time difference of 2.943595 mins

and on an overclocked 4.13GHz machine with lots of RAM, my time this
morning was under 2 minutes with other stuff going on on that machine as
well.  On the slowest machine that I still have R installed on (a single
2GHz core), it took 11.00175 mins.  So obviously one early step is to
get a faster CPU, optimized single-threaded BLAS, etc.

So, before I continue, I'll go into a bit of *why* the RSI demo is the
most CPU-intensive demo in the package, and what could be done without
changing the package to speed it up.

As is discussed at relative length in the documentation, indicator and
signal generation are non-path-dependent, and should be constructed in a
vectorized (or even compiled) fashion to make them as fast as possible.
They know nothing about positions, transactions, etc, and so can be made
to run quickly.  The *path-dependent* *loop* happens in applyRules
(which is also discussed at length in the documentation). 

Before anyone complains, I'm aware that you could completely vectorize
an OHLC backtest assuming that any High or Low that went through your
target 'limit' price had gotten filled.  If that's your world, great.
It isn't my world.  In my world, we place orders into markets sometimes
involving multiple instruments at once, and act on high-frequency data.
quantstrat works reasonably well and quickly with OHLC data, as the
demos should attest, and gives huge a huge boost in business abstraction
in constructing your strategies.  Even on tick data, we routinely get
under 1 minute per core per day of tick data for our backtests, which
isn't bad at all. (though I am very interested in improving that
further!)

There's nothing we can do about the path-dependent part.  If you want a
reasonably accurate backtest, you need to do this in a path dependent
way. We need to be aware of our current position, prior transactions,
etc.  We need to be stateful, rather than stateless, at this point in
the strategy evaluation.

The RSI demo was constructed to generate a *lot* of signals.  It is
supposed to be a torture test for quantstrat. Inside applyRules (which
will be one of the primary opportunities to speed things up, more on
that later), we create a dimension reduction index ('dindex' in the
code) to cut down the number of places we need to check to see if the
strategy needs to do something.  

You found the reference to the dindex loop in the documentation here:

> #' The solution we've employed makes use of what we know about the strategy and
> #' the orders the strategy places (or may place) to reduce the dimensionality of the problem.

We still need to check every TRUE (or whatever 'sigval' you use) to see
if the strategy needs to do something.  So, one very reasonable approach
to speeding things up s to generate fewer signals.  Make the signals
that you do generate have more value.  Fewer signals leads to a nearly
linear improvement in speed of quantstrat, proportional to the number of
signals.  Adding the dimension reduction index improved performance by
an order of magnitude or more for most strategies.

Next, we get to the loop.  R is pretty awful at loops, as is documented
all over the place.  Eventually, the dindex loop inside applyRules will
probably get rewritten in C or C++, and compiled.  This should save a
lot of time.  Why haven't we done this already?  Two reasons: 

1> I'm not sure the logic is done/mature/stable yet.
As soon as that loop is rewritten in C/C++, it gets a lot harder to
modify the logic.  R is very simple to work in, being a scripting
language.  C/C++ both have more overhead regarding allocation, logic,
teardown, and communication to R.  So it doesn't necessarily make a lot
of sense to rewrite in a compiled language until we're pretty happy with
the logic of how things are working.

2> No one has volunteered and supplied patches which do this.
We'd really like someone to do that, pretty please?  With a pony on top?
All of the authors of quantstrat work professionally in financial
services. We have deadlines, portfolios, P&L's, and staff members to
take care of before writing open source code.  I use quantstrat nearly
every day in my 'day job', and in fact created it to improve my
productivity, which it has done.  That doesn't mean I have a ton of time
to improve the infrastructure, unless I'm doing so to solve a problem I
am having right now on a real strategy or portfolio.  

So now, if we've reduced our signals as much as is reasonable, and can't
do anything additional to the loop for now, what can we do to speed
things up?

*Use more cores.*
If you look at the applyParamenter function, which was helpfully
contributed to quantstrat by Yu Chen this summer as part of his GSoC
project, you'll see that he uses foreach to parallelize execution of
parameter sets.  This will be useful in your genetic algorithm
investigations, as we'll see.  For now, it should suffice to say that in
normal use, I tend to use a foreach loop on a cluster to give each
worker node one symbol to work on.  This give nearly linear speedup by
number of cores/workers applied.  applyParameter contains some
additional cleanup code to bring everything back together, but I have
written many wrappers for production strategies that utilize foreach to
split up the problem into massively parallel bits, and run each of these
on a separate worker.

> Logically adding genetic algorithim wrapper function to test the result 
> with multiple variation of parameters for each indicator and for each 
> asset further impacts performance. ( In my case with passing initial 
> population of 50 and max iteration of 100 it was still running after 
> 1.5 hour when stopped R).

You haven't provided a reproducible example, so I can't comment on what
you're doing in any detail.  Please *do* post a reproducible example.

I'll comment in general instead.

On a production portfolio of six main strategies and several thousand
total strategy configs, our monthly strategy parameter optimization run
took about three days on a cluster of machines.  On a different strategy
the weekly parameter optimization took most of the weekend running on 24
cores.  quantstrat is faster now than it was then, but this should be
indicative that simply waiting 1.5 hours may not be enough.

I think that we will require more effort to understand what you're
trying to accomplish.  Yu's GSoC project ended before we could
incorporate a genetic algorithm (we would likely have used DEoptim, and
I couldn't find a link to a recent/current version of GALGO this
morning[1]).  One challenge here is the definition of an objective
function.  

What is the correct objective for an 'optimal' parameter configuration?
max Total.Realized.PL ?
min maxDrawdown       ?
max Sharpe Ratio      ?

or more likely, some arbitrary, layered, and changeable combination of
the above and some others.

This is an example of 'multi-objective optimization', and is far from a
trivial thing to get right.  The two most referenced approaches are: 

1> utilization of a penalized objective function
This is the approach I've taken in our PortfolioAnalytics package, where
we allow the user to construct an arbitrary layered objective, and apply
reasonable but changeable defaults to penalization.  It works pretty
well in the weights-based world of portfolio optimization, but I haven't
had the time to work on creating a penalized objective function for
quantstrat parameter optimization.  

2> utilization of a Pareto-optimal shrinkage frontier.
This approach will be much more complicated, and involves creating some
proxy function of the multi-objective space.  I think it is extremely
promising, and is covered in depth in the book 'Adaptive Differential
Evolution' by Zhang and Sanderson[2].

This would be a great area for new contributions to quantstrat from you
or some other reader of R-SIG-Finance, as it is unlikely I will have the
time necessary to devote to a generalized implementation any time soon.

> It's worth of saying that GALGO works efficiently fast on its own (eg 20-30 
> sec per asset) outside quantstrat if applied on Sharp Ratio or simplistic 
> return calcs, but there are some great benefits of trying to incorporate it quantstrat.

Agreed.  I think that the initial approach here will likely be similar
to the one that I used in PortfolioAnalytics, where we would create an
objective function, and then use code derived from the parallel version
of applyParameter to split things up into non-overlapping chunks.

> Coming back to demo example for RSI it appears that it takes up to 45 seconds 
> per stock to do calculations from 2007 to 2011. With applyStrategy and applyRules 
> being the largest drag on performance. 

Correct (see the documentation and above)

> I noticed that setting path.dep to FALSE can improve performance but its fair 
> to say that most of strategies I test are path-dependant.

I think it is a fair statement that most strategies are path dependent.
I included the option to do things in a non-path dependent way because
for some strategies it may make sense to write a rule function that
simply evaluates signals and generates transactions directly which would
not need to be path dependent, but that will likely be a custom rule
function for just one strat, not a more general solution.

> The question here is do you for your own perusal re-write applyStrategy and 
> applyRule function to get rid [of] parts that slow down its execution and may be 
> unnecessary?

No.  I use them as written, and speed them up as I have a chance.
This was the genesis of dindex, and some other optimizations done by
myself or Josh Ulrich at various points in time.

The ruleSignal function was written as an example.  I assumed that we
would write custom rule functions for all our 'real' strategies.  That
hasn't turned out to be the case, we find that we wrtie custom indicator
functions and custom signal functions, but typically use ruleSignal to
evaluate them.  There's probably an opportunity to optimize performance
in ruleSignal, though I haven't looked at this.

It seems that you've already done some profiling on the code.  This will
be the main tool for further improvements.  patches always welcome.

> Or
> Any optimization of indicator parameters you do outside of quantstrat (blotter) 
> and apply quantstrat only on optimized parameters?

See
?applyParameter

This function (and all of the parameter testing code) are brand new, and
need further refinement.  Today, we have methods only for brute force
and random selection of parameters.  some genetic algorithm would be
great, but we need an objective function. 

> To end without thank you for hard work on quantstrat, would be totally 
> unfair as it has so many benefits for so many of us.

Thanks.  We certainly appreciate the acknowledgement.

> At the same token it would be pointless to embark on endeavour to re-invent the 
> wheel, but rather to look on ways to look for ways to improve performance 
> of applyStrategy function.

I agree completely.  We want to improve what we have, not start over, as
this is already the (current) culmination of several years of work.

> Looking forward to your feedback.

I am Looking forward to your continued input and contributions!

Regards,

   - Brian

References:
[1] http://bioinformatica.mty.itesm.mx/?q=galgo2
    http://bioinformatics.oxfordjournals.org/content/22/9/1154.full  
    (second reference contains broken link to package)

[2] Adaptive Differential Evolution: A Robust Approach to Multimodal
Problem Optimization
Jingqiao Zhang and Arthur C. Sanderson 
http://www.amazon.com/Adaptive-Differential-Evolution-Multimodal-Optimization/dp/3642015263/

-- 
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock