[R-SIG-Finance] Quantstrat - running applyStrategy in a loop

Mon Aug 20 13:40:28 CEST 2018

On Sun, 2018-08-19 at 17:16 -0400, James Hirschorn wrote:
> I plan to try it out myself, but I wanted to check here if running
> applyStrategy in a loop, while looping over different dates, will
> work? I could not find any examples of this.
> 
> There are 2 reasons for wanting to do this: First of all, one could
> have a couple of years of tick data, which is too big to fit in
> memory for each symbol. Of course, I am assuming that the orders
> placed by the strategy are sparse enough so that the order_book
> generated by applyStrategy can still fit in memory.
> 
> The second reason is that if this loop could moreover be run in
> parallel, then there could potentially be a 500x speed up for two
> years of data.

James,

The answer is 'it depends'.

There is a parallel version of applyStrategy in the sandbox on github. 
I haven't touched it in several years, so I wouldn't trust that code. I
mention it as an example of what is theoretically possible.  A better
example, which is already parallelized and much more highly utilized,
is apply.paramset().

First, to expand on Ilya's answer, let's talk about what *is* possible.

It is possible to wrap a foreach loop over applyStrategy that would
separate symbols to different workers (though your hypothesized 500x
speedup would require *at least* 500 worker nodes, spread out over
several physical machines, using something like doRedis, which we have
tested up to around 200 workers).  This assumes that each symbol is
completely independent, and that there is no interaction on things like
trade sizing or capital or risk among the symbols.  The simplest way to
do this would be to create separate portfolios per symbol, so that each
worker is completely independent.  See examples of a different kind of
splitting and parallelization in appply.paramset() (which is also used
in walk forward testing).

It is also possible, and we commonly do this, to segment the dates that
you want to run applyStrategy over.  As you hypothesized, a simple loop
over date regions, loading different non-conflicting time series, may
be applied to successively run each date range.  This, as you noted,
works well when even 64, 128, or 512GB+ of RAM is not enough for all of
your data.  We've made a number of changes over the years to make
quantstrat more memory efficient, but copies are still made when
unavoidable, state is kept between the various nested apply* functions,
and RAM use basically grows throughout the run of a strategy
evaluation.  So segmenting the use of market data by Dates can help,
though you may need to discard some intermediary results (like portions
of the order book) to make everything fit.  

In the first example of parallelizing by symbol, RAM is your most
likely issue still, since even very large machines rarely have more
than about 16GB per core/thread.

You still have some wrinkles here.  Again, you need to assess whether
there is any interaction.  Transactions cannot be added to a portfolio
out of order, as the P&L is (potentially) dependent on prior
transactions.  So you may again need to create multiple portfolios and
stitch the different period P&L together yourself.

So, in the 'don't do that' camp, don't try to apply transactions out of
order, the trade blotter won't allow it.

In the 'should work' camp are several variations of splitting your
computational problem so that it is amendable to looping and/or
parallelization, described above.

Regards,

Brian

-- 
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock