[R-SIG-Finance] Parallelizing applyStrategy to multiple symbols

Tue Mar 7 22:17:51 CET 2017

Hi Atakan,

I use a batch file to run most of my R programs. That way I just have to get
it right once and then I can run it many times. The following is a simple
batch command, scatter_plot.bat,  to run some regressions:

"C:\Program Files\R\R-3.0.2\bin\x64\R.exe" CMD BATCH  " Scatter_Plot.txt" "
Scatter_Plot.out"

Scatter_plot.txt contains generic R commands that use data in the current
directory. Scatter_Plot.out will contain the output from the commands in the
text file. 

If I'm analyzing SPY data for 2016, I would use a data structure like:

\SPY\2016\01
\SPY\2016\02
\SPY\2016\03
.
.
.

So that I can analyze one month's data and save the output in one directory.
January data and output to \SPY\2016\01, etc. I have 8 execution paths and
can run 8 months of data simultaneously.  My program is small and does not
use up all available physical memory. I would run the final 4 months when 4
of the 8 initial months are finished. 

If I run more than 8 data intensive regressions, what Brian is saying is
that the OS will spend extra time allocating which thread from which process
gets loaded into the next available execution path. If I were to use up more
than the available physical memory, if that thread was swapped out to the
disk, the process would need to be loaded back into memory and executed
while some process in memory would have to be swapped out to the hard drive.
This traffic will slow things down dramatically. 

At the end of the batch file, the output is copied up one directory, in this
case to 2016, with the year and month appended to a generic file name. There
is a batch file in 2016 to concatenate all data from the different months
into one file for 2016.

Best,

Frank
Chicago

-----Original Message-----
From: Atakan Okan [mailto:atakanokan at outlook.com] 
Sent: Monday, March 06, 2017 4:37 PM
To: Frank <frankm60606 at gmail.com>
Cc: Brian G. Peterson <brian at braverock.com>; r-sig-finance at r-project.org
Subject: Re: [R-SIG-Finance] Parallelizing applyStrategy to multiple symbols

Hi Frank,

I just thought of an idea based on your suggestion. Instead of trying to
implement a foreach loop, I will try to subset my symbol set into different
R sessions with the create a new r session option in Rstudio and then run
each subset on a different session with the default call to applyStrategy. I
think this is what you were suggesting or I might have understood it
incorrectly. 

Hi Brian,
 My understanding of parallelization wasnt enough to grasp all of your
reply, but I am not planning on doing rebalancing or testing any strategy
that need to "talk" to other threads. Each symbol is backtested on its own
withiut any input or output to and from other symbols' backtest. Would my
idea suggested above work in this case? I think I explained my problem
inadequately; the time of completion of a single symbol's backtest is not
the issue but the sequential computing of each symbol's backtest and
consequently, linearly increasing completion time of all symbols' backtest
is the main issue. I just want to divide each symbol's applyStrategy call to
each CPU my laptop has to speed up the process. Like apply.paramset but not
for each parameter combination, for each symbol. I hope I have explained
better. 

Thanks for the help.

Best,

Atakan Okan

> On 6 Mar 2017, at 23:55, Frank <frankm60606 at gmail.com> wrote:
> 
> Atakan,
> 
> What kind of computer do you have? Number of cores, memory, hyperthreaded
or not?
> 
> /Brian Does this package take advantage of hyperthreading? By your comment
it suggests it does for multiple cores and I would assume hyper threading.
> 
> When I do non-R computer intensive work, I break it up into chunks of 8. I
have an i7 that hyper threads which pegs the CPU at 100%. If you had a
similar setup, you could break your 100 symbol list down into 8 datasets and
run them simultaneously. 
> 
> Regardless, adding memory is usually a cheap and mindless way to improve
throughput.
> 
> Best,
> 
> Frank
> Chicago, IL
> 
> -----Original Message-----
> From: R-SIG-Finance [mailto:r-sig-finance-bounces at r-project.org] On 
> Behalf Of Brian G. Peterson
> Sent: Monday, March 06, 2017 1:46 PM
> To: Atakan Okan <atakanokan at outlook.com>; r-sig-finance at r-project.org
> Subject: Re: [R-SIG-Finance] Parallelizing applyStrategy to multiple 
> symbols
> 
> I suspect you're running up against communication and memory management
time and resource contention.  
> 
> applyIndicators and applySignals should all be using vectorized code, so
the potential benefit from parallelization will likely be negative, as
communication and memory management swap any benefit from the calculations.
> 
> applyRules might benefit from parllelization, but you would need to come
back together on any rebalancing period.  You would also have significant
copying time.
> 
> If you were going to make this work, you'd need to minimize copies. 
> Your effective 'reduce' operation at the end by only returning tradeStats
could do this for the end of the calculation, but at the start, you'd need
to be smarter about how you segment market data to each worker. 
> 
> Just putting getSymbols on the workers might run into I/O contention
issues.  You also don't need to redeclare the strategy object.  You could
just copy that to each worker.
> 
> When we've done things as a one-off, we typically create portfolios for
each segment, and try to avoid as many copies as we can.
> 
> You'd need to profile to see exactly where you're getting hung up, but
this approach seems too simplistic (see my first sentence for hints).
> 
> We haven't bothered to do this in the package itself since with a little
work we can usually get to around one core minute per symbol per day on L1
tick data, which means that even a large backtest on tick data can finish in
a few hours.  The cost of optimizing execution time doesn't seem to be worth
the cost in programming and testing time.
> 
> Regards,
> 
> Brian
> 
> --
> Brian G. Peterson
> http://braverock.com/brian/
> Ph: 773-459-4973
> IM: bgpbraverock
> 
>> On Mon, 2017-03-06 at 18:53 +0000, Atakan Okan wrote:
>> Hello,
>> 
>> I am trying to parallelize applyStrategy() to make it faster when 
>> applied to multiple symbols. The reproducible code below only 
>> contains
>> 3 symbols thus it finishes fast however when I apply it to
>> 100 symbols in an index, sequential computing takes a lot of time.
>> What is the best way to accomplish this? Using foreach loop does not 
>> seem to work and couldn't find any info on stackexchange or the usual 
>> mailing lists.
>> 
>> Thanks.
>> 
>> Atakan Okan
>> 
>> Code with applyStrategy (foreach is below this):
>> 
>> library(quantmod)
>> library(quantstrat)
>> 
>> symbols <- c("AAPL","GOOGL","MSFT")
>> 
>> getSymbols(Symbols = symbols, from = "2010-01-01")
>> 
>> currency('USD')
>> stock(symbols, currency="USD")
>> 
>> strategy.st <- "multiple_symbols_parallel_applystrategy"
>> rm.strat(strategy.st)
>> 
>> 
>> initPortf(strategy.st, symbols = symbols) initAcct(strategy.st, 
>> portfolios=strategy.st, initEq=100000)
>> initOrders(portfolio=strategy.st)
>> strategy(strategy.st,store=TRUE)
>> 
>> rule.longenter  = TRUE
>> rule.longexit   = TRUE
>> rule.shortenter = TRUE
>> rule.shortexit  = TRUE
>> 
>> txn.model <- 0
>> 
>> add.indicator(strategy.st,
>>              name = "MACD",
>>              arguments = list(x=Cl(get(symbols))),
>>              label='macd')
>> 
>> add.signal(strategy.st,name="sigCrossover",
>>           arguments = list(columns=c("macd.macd","signal.macd"),
>>                            relationship="gt"),
>>           label="macd.gt.signal")
>> 
>> add.signal(strategy.st,name="sigCrossover",
>>           arguments = list(columns=c("macd.macd","signal.macd"),
>>                            relationship="lt"),
>>           label="macd.lt.signal")
>> 
>> add.rule(strategy.st,
>>         name='ruleSignal',
>>         arguments = list(sigcol="macd.gt.signal",
>>                          sigval=TRUE,
>>                          prefer="Open",
>>                          orderqty= 1000,
>>                          #osFUN="osAllInLong",
>>                          ordertype='market',
>>                          orderside='long',
>>                          orderset='ocolong',
>>                          TxnFees = txn.model),
>>         type='enter',
>>         label='longenter',
>>         enabled=FALSE
>> )
>> 
>> add.rule(strategy.st,
>>         name='ruleSignal',
>>         arguments = list(sigcol="macd.lt.signal",
>>                          sigval=TRUE,
>>                          prefer="Open",
>>                          orderqty='all',
>>                          ordertype='market',
>>                          orderside='long',
>>                          orderset='ocolong',
>>                          TxnFees = txn.model),
>>         type='exit',
>>         label='longexit',
>>         enabled=FALSE
>> )
>> 
>> 
>> add.rule(strategy.st,
>>         name='ruleSignal',
>>         arguments = list(sigcol="macd.lt.signal",
>>                          sigval=TRUE,
>>                          prefer="Open",
>>                          orderqty=-1000,
>>                          #osFUN="osAllInShort",
>>                          ordertype='market',
>>                          orderside='short',
>>                          orderset='ocoshort',
>>                          TxnFees = txn.model),
>>         type='enter',
>>         label='shortenter',
>>         enabled=FALSE
>> )
>> 
>> add.rule(strategy.st,
>>         name='ruleSignal',
>>         arguments = list(sigcol="macd.gt.signal",
>>                          sigval=TRUE,
>>                          prefer="Open",
>>                          orderqty='all',
>>                          ordertype='market',
>>                          orderside='short',
>>                          orderset='ocoshort',
>>                          TxnFees = txn.model),
>>         type='exit',
>>         label='shortexit',
>>         enabled=FALSE
>> )
>> 
>> enable.rule(strategy.st,type="enter",label="longenter", enable =
>> rule.longenter)
>> enable.rule(strategy.st,type="exit",label="longexit", enable =
>> rule.longexit)
>> enable.rule(strategy.st,type="enter",label="shortenter", enable =
>> rule.shortenter)
>> enable.rule(strategy.st,type="exit",label="shortexit", enable =
>> rule.shortexit)
>> summary(getStrategy(strategy.st))
>> 
>> 
>> applyStrategy( strategy=strategy.st ,
>>               portfolios=strategy.st,
>>               symbols = symbols,
>>               verbose=TRUE)
>> updatePortf(strategy.st)
>> updateAcct(strategy.st)
>> updateEndEq(strategy.st)
>> 
>> -------------------------------------------------------------------
>> -------------------------------------------------------------------
>> -------------------------------
>> Code with foreach:
>> 
>> library(quantmod)
>> library(quantstrat)
>> 
>> if(Sys.info()["sysname"] == "Windows") {
>>  library(doSNOW)
>>  cl <- makeCluster(4)
>>  registerDoSNOW(cl)
>> }
>> if(Sys.info()["sysname"] == "Linux") {
>>  library(doMC)
>>  registerDoMC(cores=4)
>>  #registerDoSEQ()
>>  getDoParWorkers()
>> }
>> 
>> symbols <- c("AAPL","GOOGL","MSFT")
>> 
>> sens.df <- foreach(sym = 1:length(symbols),
>>                   .combine = 'rbind',
>>                   .packages = c("quantstrat","quantmod")) %dopar% {
>> 
>>  getSymbols(Symbols = sym, from = "2010-01-01")
>> 
>>  currency('USD')
>>  stock(sym, currency="USD")
>> 
>>  strategy.st <- "multiple_symbols_parallel_applystrategy"
>> 
>> rm.strat(strategy.st)
>> 
>> 
>>  initPortf(strategy.st, symbols = sym)  initAcct(strategy.st, 
>> portfolios=strategy.st, initEq=100000)
>>  initOrders(portfolio=strategy.st)
>>  strategy(strategy.st,store=TRUE)
>> 
>>  rule.longenter  = TRUE
>>  rule.longexit   = TRUE
>>  rule.shortenter = TRUE
>>  rule.shortexit  = TRUE
>> 
>>  txn.model <- 0
>> 
>>  add.indicator(strategy.st,
>>                name = "MACD",
>>                arguments = list(x=Cl(get(sym))),
>>                label='macd')
>> 
>>  add.signal(strategy.st,name="sigCrossover",
>>             arguments = list(columns=c("macd.macd","signal.macd"),
>>                              relationship="gt"),
>>             label="macd.gt.signal")
>> 
>>  add.signal(strategy.st,name="sigCrossover",
>>             arguments = list(columns=c("macd.macd","signal.macd"),
>>                              relationship="lt"),
>>             label="macd.lt.signal")
>> 
>>  add.rule(strategy.st,
>>           name='ruleSignal',
>>           arguments = list(sigcol="macd.gt.signal",
>>                            sigval=TRUE,
>>                            prefer="Open",
>>                            orderqty= 1000,
>>                            #osFUN="osAllInLong",
>>                            ordertype='market',
>>                            orderside='long',
>>                            orderset='ocolong',
>>                            TxnFees = txn.model),
>>           type='enter',
>>           label='longenter',
>>           enabled=FALSE
>>  )
>> 
>>  add.rule(strategy.st,
>>           name='ruleSignal',
>>           arguments = list(sigcol="macd.lt.signal",
>>                            sigval=TRUE,
>>                            prefer="Open",
>>                            orderqty='all',
>>                            ordertype='market',
>>                            orderside='long',
>>                            orderset='ocolong',
>>                            TxnFees = txn.model),
>>           type='exit',
>>           label='longexit',
>>           enabled=FALSE
>>  )
>> 
>> 
>>  add.rule(strategy.st,
>>           name='ruleSignal',
>>           arguments = list(sigcol="macd.lt.signal",
>>                            sigval=TRUE,
>>                            prefer="Open",
>>                            orderqty=-1000,
>>                            #osFUN="osAllInShort",
>>                            ordertype='market',
>>                            orderside='short',
>>                            orderset='ocoshort',
>>                            TxnFees = txn.model),
>>           type='enter',
>>           label='shortenter',
>>           enabled=FALSE
>>  )
>> 
>>  add.rule(strategy.st,
>>           name='ruleSignal',
>>           arguments = list(sigcol="macd.gt.signal",
>>                            sigval=TRUE,
>>                            prefer="Open",
>>                            orderqty='all',
>>                            ordertype='market',
>>                            orderside='short',
>>                            orderset='ocoshort',
>>                            TxnFees = txn.model),
>>           type='exit',
>>           label='shortexit',
>>           enabled=FALSE
>>  )
>> 
>>  enable.rule(strategy.st,type="enter",label="longenter", enable =
>> rule.longenter)
>>  enable.rule(strategy.st,type="exit",label="longexit", enable =
>> rule.longexit)
>>  enable.rule(strategy.st,type="enter",label="shortenter", enable =
>> rule.shortenter)
>>  enable.rule(strategy.st,type="exit",label="shortexit", enable =
>> rule.shortexit)
>> 
>> summary(getStrategy(strategy.st))
>> 
>> 
>>  applyStrategy( strategy=strategy.st ,
>>                 portfolios=strategy.st,
>>                 symbols = sym,
>>                 verbose=TRUE)
>>  updatePortf(strategy.st)
>>  updateAcct(strategy.st)
>>  updateEndEq(strategy.st)
>> 
>>  results.checkstrat <- data.frame(t(tradeStats(strategy.st)))
>> 
>>  return(results.checkstrat[,1])
>> 
>> }
>> 
>> if (Sys.info()["sysname"] == "Windows"){
>>  snow::stopCluster(cl)   #dosnow  windows }
>> 
>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R 
>> questions should go.
> 
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
should go.
>