[R-SIG-Finance] Blotter example by kafka from R-bloggers

Thu Jan 6 20:52:16 CET 2011

Hi

One thing that I want to understand is the effect of stop-loss activity 
on results and employing more complex rules.  The two examples I am 
looking at have fairly simple rules like:

# three days higher close, high and open than on previous day

#one day before
lag1<-lag((SPY),1)

#two days defore
lag2<-lag((SPY),2)

signal<-ifelse( (Cl(lag2)>Cl(lag1) & Cl(lag1)>Cl(SPY))&
             (Hi(lag2)>Hi(lag1) & Hi(lag1)>Hi(SPY)) &
             (Op(lag2)>Op(lag1) & Op(lag1)>Op(SPY)),
             1,0
)

and

# if today's low is higher than yesterday's close 1, else 0

signal<-ifelse(Lo(SPY)>Cl(tmp),1,0)
signal[1]<-0

First on more complex rules: I have tried looking at vector operations 
but trying to write a rule for spreads like this:

[rule for opening]
if !not_open(yesterday(last_spread > 2 * standard deviation) and 
today(last_spread < 2 * standard deviation)) -> open short spread )(and 
vica versa)

[rule for stop loss]
if open(last_spread > opening_spread * 1.05 [stop loss]) -> close short 
(and vica versa)

[rule for closing]

if open(last_spread < moving average) -> close short (and vica versa)

defeated me and I ending up writing some code like this (notice that I 
haven't got the stop loss rule in it):

i <- 1
long = 0
short = 0
for (i in seq(from=1,to=length(spread.data$Close),by=1)) {
     # lets get the data in more usable names
     close.today <- spread.data[i,1]
     close.yesterday <- spread.data[i-1,1]
     # just to deal with the first period when there is no yesterday
     if(i == 1) close.yesterday <- close.today
     mean.today <- spread.data[i,2]
     mean.yesterday <- spread.data[i-1,2]
     # just to deal with the first period when there is no yesterday
     if(i == 1) mean.yesterday <- mean.today
     upper.boundary.today <- spread.data[i,3]
     upper.boundary.yesterday <- spread.data[i-1,3]
     # just to deal with the first period when there is no yesterday
     if(i == 1) upper.boundary.yesterday <- upper.boundary.today
     lower.boundary.today <- spread.data[i,4]
     lower.boundary.yesterday <- spread.data[i-1,4]
     # just to deal with the first period when there is no yesterday
     if(i == 1) lower.boundary.yesterday <- lower.boundary.today
     # lets try and find if we have a long signal
         #print(c(i, 
-close.yesterday,lower.boundary.yesterday,close.today,lower.boundary.today))
         ################## RULES FROM HERE ##################
         # spread$Close - spread$Close.1
         ####### FIRST FOR A LONG #####
         ####### first find lower boundary crossings #####
         if(long == 0) position = 0
         if(close.yesterday <= lower.boundary.yesterday && close.today > 
lower.boundary.today) long = 1
         ####### find mean crossings #####
     if (long == 1 && close.today > mean.today) long = 0
     sigup[i] <- long
     #print(c(i, long, 
close.yesterday,lower.boundary.yesterday,close.today,lower.boundary.today))
     ####### THEN FOR A SHORT #####
     ####### first find upper boundary crossings #####
     if(close.yesterday >= upper.boundary.yesterday && close.today < 
upper.boundary.today) short = -1
     ####### find mean crossings #####
     if (short == -1 && close.today < mean.today) short = 0
     sigdn[i] <- short

}

So I put it all in a loop and carry forward my positions/triggers from 
one day to the next which is sort of the way I would normally program.

Can you write rules such as I am trying to using vector operations and 
does blotter lend itself to this?

Second: on  more general note this whole question of stop loss is very 
significant to results.  I find that most back testing is based upon not 
adopting such a policy, but prudence would almost always insist on one 
doing so.   If you have no real option but to adopt a stop loss policy 
then the most important question is what is the correct level of 
protection.  I get very annoyed when my strategy works without a stop 
loss and then the first time I take a position I get closed out by my 
stop loss and lose money and then the next day or the day after I find 
the figures put me back in the black.   Anyway, I guess this is just an 
iterative process using a binary search but, again, are there any useful 
ideas about how one can go about this sort of optimisation re-using 
existing packages/code?

Stephen Choularton Ph.D., FIoD

On 29/12/2010 7:18 AM, Brian G. Peterson wrote:
> On 12/28/2010 01:28 PM, Stephen Choularton wrote:
>> My apologies.
>>
>> I did not realize the script worked so slowly.  I reduced the time scale
>> it covered so it commenced at the beginning of the year and it did run
>> to completion.  I will try the full term and see if it produces the same
>> graphs as the original example.
> >
>> I'm always a bit worried about warnings as they often mean something is
>> going wrong and it might be useful if kafta had warned one not to worry
>> about them.  Mind you I think he did say it all took a long time ;-)
>
>
> The reason this script runs slowly is that it is calling updatePortf, 
> updateAcct, and updateEndEq after each and every observation to do 
> order sizing.
>
> As a matter of practice, if you can 'cheat' and say 'I've got $1000000 
> to invest, and I don't mind being a little leveraged', you don't need 
> to do that, and things are *much* faster.  For example, we can 
> typically run a strategy backtest on *tick* data (millions of 
> observations) in less than a minute per day.
>
> The reason for this divergent length of time is that the blotter 
> update* functions do a *lot* of calculations, and all of those take 
> time, even though they are vectorized where possible.
>
> Perhaps a middle ground would be to call the update* functions 
> monthly, or something similar.
>
> I found his example script to be slower than I am used to, but not 
> unbearable, and believe that it finished in a couple minutes, though 
> its been a while since I ran it...
>
>
>> I can assure you I do try and read man before I ask for help but dealing
>> with other people's code is not always easy particularly when working
>> with a programming system that uses a different paradigm like R with its
>> emphasis on operations on vectors and the like. and the extensive use of
>> calls to functions each of which often require a wet towel and cup of
>> coffee to understand.
>>
>> I added the parameter definitions you suggest:
>>
>> currency("USD")
>> stock("SPY",currency="USD",multiplier=1)
>>
>> and the warnings reduced to one:
>
> Good.
>
>> Warning messages:
>> 1: In updatePortf(ltportfolio, Dates = currentDate) :
>>    Incompatible methods ("Ops.Date", "Ops.POSIXt") for ">="
> <...>
>
>> "Ops.Date", "Ops.POSIXt" don't appear in the function call so they must
>> be somewhere deeper.  I'm afraid I'm currently a windows user so grep is
>> not available and the windows native text search didn't reveal much.
>> However, I did find some references in the documentation (Date-Time
>> Classes, Operators on the Date Class & S3 Group Generic Functions) but
>> Ops.POSIXt doesn't appear therein only POSIXlt and Ops.POSIXct.  Is
>> there a typo somewhere ?
>
> It's likely not a typo, but rather an incompatible index between one 
> time series and another.  You'd need to check the indices of each of 
> the input series, or of the custom order sizing function from the 
> script to see what's going on.  If the output from your run and the 
> blog post agree, I wouldn't bother.
>
>> It would be nice to get rid of the warnings.
>
>