[R-SIG-Finance] Quantstrat - running applyStrategy in a loop

Tue Aug 21 20:21:18 CEST 2018

Thanks for the very detailed reply!

I couldn't find a parallel applyStrategy in the sandbox. Is it still there,
and if so what is the filename?

In any case, if I understood correctly it should instead be modelled on
apply.paramset().

You mentioned that you commonly segment the dates that you run
applyStrategy over. If you have an example, could you please point it out?

I will report back once I attempt a parallelization.

Yes, I have run into problems with out of order transactions in two
different situations. Once of them was when using the delay argument of
ruleSignal, as you had suggested in an SO answer. But that is off topic for
this thread...

Regards,
James

On Mon, Aug 20, 2018 at 7:40 AM, Brian G. Peterson <brian using braverock.com>
wrote:

> On Sun, 2018-08-19 at 17:16 -0400, James Hirschorn wrote:
> > I plan to try it out myself, but I wanted to check here if running
> > applyStrategy in a loop, while looping over different dates, will
> > work? I could not find any examples of this.
> >
> > There are 2 reasons for wanting to do this: First of all, one could
> > have a couple of years of tick data, which is too big to fit in
> > memory for each symbol. Of course, I am assuming that the orders
> > placed by the strategy are sparse enough so that the order_book
> > generated by applyStrategy can still fit in memory.
> >
> > The second reason is that if this loop could moreover be run in
> > parallel, then there could potentially be a 500x speed up for two
> > years of data.
>
> James,
>
> The answer is 'it depends'.
>
> There is a parallel version of applyStrategy in the sandbox on github.
> I haven't touched it in several years, so I wouldn't trust that code. I
> mention it as an example of what is theoretically possible.  A better
> example, which is already parallelized and much more highly utilized,
> is apply.paramset().
>
> First, to expand on Ilya's answer, let's talk about what *is* possible.
>
> It is possible to wrap a foreach loop over applyStrategy that would
> separate symbols to different workers (though your hypothesized 500x
> speedup would require *at least* 500 worker nodes, spread out over
> several physical machines, using something like doRedis, which we have
> tested up to around 200 workers).  This assumes that each symbol is
> completely independent, and that there is no interaction on things like
> trade sizing or capital or risk among the symbols.  The simplest way to
> do this would be to create separate portfolios per symbol, so that each
> worker is completely independent.  See examples of a different kind of
> splitting and parallelization in appply.paramset() (which is also used
> in walk forward testing).
>
> It is also possible, and we commonly do this, to segment the dates that
> you want to run applyStrategy over.  As you hypothesized, a simple loop
> over date regions, loading different non-conflicting time series, may
> be applied to successively run each date range.  This, as you noted,
> works well when even 64, 128, or 512GB+ of RAM is not enough for all of
> your data.  We've made a number of changes over the years to make
> quantstrat more memory efficient, but copies are still made when
> unavoidable, state is kept between the various nested apply* functions,
> and RAM use basically grows throughout the run of a strategy
> evaluation.  So segmenting the use of market data by Dates can help,
> though you may need to discard some intermediary results (like portions
> of the order book) to make everything fit.
>
> In the first example of parallelizing by symbol, RAM is your most
> likely issue still, since even very large machines rarely have more
> than about 16GB per core/thread.
>
> You still have some wrinkles here.  Again, you need to assess whether
> there is any interaction.  Transactions cannot be added to a portfolio
> out of order, as the P&L is (potentially) dependent on prior
> transactions.  So you may again need to create multiple portfolios and
> stitch the different period P&L together yourself.
>
> So, in the 'don't do that' camp, don't try to apply transactions out of
> order, the trade blotter won't allow it.
>
> In the 'should work' camp are several variations of splitting your
> computational problem so that it is amendable to looping and/or
> parallelization, described above.
>
> Regards,
>
> Brian
>
> --
> Brian G. Peterson
> http://braverock.com/brian/
> Ph: 773-459-4973
> IM: bgpbraverock
>

	[[alternative HTML version deleted]]