[R-SIG-Finance] Processing time of backtests on a single computer

Sat Apr 9 15:28:31 CEST 2016

On Fri, 2016-04-08 at 08:50 +0300, Jersey Fanatic wrote:
> So here are my latest results of testing using the same dataset and
> rules
> (no trailing stoploss):
> 8-core using doSNOW -> 1.07 hours
> 4-core using doSNOW -> 59.7 minutes
> single-core -> 2.11 hours
>
> So yeah, I guess it is normal for a strategy with high number of
> transactions to take this long to backtest.

I tested a version of this strategy on a 6-core (12 thread) i7-4930K
CPU @ 3.40GHz with 64GB of RAM. [Ref. 1 processingtime_q.ps.R]

4.64 hrs one core (registerDoSEQ)
2.12 hrs six cores using doMC (defaults)
1.77 hrs twelve threads using DoMC (defaults)
1.32 hrs six cores doMC (mc.prescheduling=FALSE)
   44 min twelve threads using DoMC (mc.prescheduling=FALSE)

My script was based on the script originally posted to this thread, and
likely had more rules and tighter parameters than the test reported by
the OP above.

Given a single script and parameter combination verified with the OP,
this machine was about twice as fast. [Ref. 1 
processingtime_q_rsigfinance.R]
5.15 min. for OP and 1.99 min on this reported test machine

Some observations:

RAM:
- all tests consumed more than 8GB of RAM at some point (8.1GB for the
single thread version, 11.8GB for the six thread version, and 18GB for
the 12-thread version)

CPU load:
- the 6-thread test had a load average of about 8, and the 12-thread
  test had a load average below 7, suggesting that the 12-thread test
is using resources less efficiently. with prescheduling FALSE, load
averages were higher for both tests, about 10 for the 6-core test
and about 16 for the 12-thread test.

Load Balancing:
- after about an hour, a load balancing problem was observed, fewer than
half the cores/threads were still executing at 100%.  In the case of the
twelve-thread version, after an hour and half, only one CPU was still
spinning at 100%. It is probable that use of a backend like doRedis or
zmq that are designed for load balancing or taking advantage of a
different prescheduling method in the multicore or SNOW backends would
shorten the execution time, potentially by a lot.

Economic Justification:
- I observed that the shortest timeframe indicator and signal processes
are the most aggressive, often scratching trades.  If trade costs were
taken into account when analyzing the signal process (before even
contemplating rules and a backtest, these short-timeframe signals would
have been ruled out before a brute force parameter search.  These
parameter combinations also take the longest to run, in addition to
likely being economically unfeasible.

References:

[1] The data file and script have been added to quantstrat's sandbox
directory in SVN, for those who are interested.

    /pkg/quantstrat/sandbox/paramtest201604/

-- 
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock