[R] Do I need to transform backtest returns before using pbo (probability of backtest overfitting) package functions?

Joe O joerodonnell at gmail.com
Tue Nov 21 22:19:32 CET 2017


Fantastic! Thank you for your help, -Joe

On Tue, Nov 21, 2017 at 2:17 PM, Eric Berger <ericjberger at gmail.com> wrote:

> Correct
>
> Sent from my iPhone
>
> On 21 Nov 2017, at 22:42, Joe O <joerodonnell at gmail.com> wrote:
>
> Hi Eric,
>
> Thank you, that helps a lot. If I'm understanding correctly, if I’m
> wanting to use actual returns from backtests rather than simulated returns,
> I would need to make sure my risk-adjusted return measure, sharpe ratio in
> this case, matches up in scale with my returns (i.e. daily returns with
> daily sharpe, monthly with monthly, etc). And I wouldn’t need to transform
> returns like the simulated returns are in the vignette, as the real returns
> are going to have whatever properties they have (meaning they will have
> whatever average and std dev they happen to have). Is that correct?
>
> Thanks, -Joe
>
>
> On Tue, Nov 21, 2017 at 5:36 AM, Eric Berger <ericjberger at gmail.com>
> wrote:
>
>> [re-sending - previous email went out by accident before complete]
>> Hi Joe,
>> The centering and re-scaling is done for the purposes of his example, and
>> also to be consistent with his definition of the sharpe function.
>> In particular, note that the sharpe function has the rf (riskfree)
>> parameter with a default value of .03/252 i.e. an ANNUAL 3% rate converted
>> to a DAILY rate, expressed in decimal.
>> That means that the other argument to this function, x, should be DAILY
>> returns, expressed in decimal.
>>
>> Suppose he wanted to create random data from a distribution of returns
>> with ANNUAL mean MU_A and ANNUAL std deviation SIGMA_A, both stated in
>> decimal.
>> The equivalent DAILY returns would have mean MU_D = MU_A / 252 and
>> standard deviation SIGMA_D =  SIGMA_A/SQRT(252).
>>
>> He calls MU_D by the name mu_base  and  SIGMA_D by the name sigma_base.
>>
>> His loop now converts the random numbers in his matrix so that each
>> column has mean MU_D and std deviation SIGMA_D.
>>
>> HTH,
>> Eric
>>
>>
>>
>> On Tue, Nov 21, 2017 at 2:33 PM, Eric Berger <ericjberger at gmail.com>
>> wrote:
>>
>>> Hi Joe,
>>> The centering and re-scaling is done for the purposes of his example,
>>> and also to be consistent with his definition of the sharpe function.
>>> In particular, note that the sharpe function has the rf (riskfree)
>>> parameter with a default value of .03/252 i.e. an ANNUAL 3% rate converted
>>> to a DAILY rate, expressed in decimal.
>>> That means that the other argument to this function, x, should be DAILY
>>> returns, expressed in decimal.
>>>
>>> Suppose he wanted to create random data from a distribution of returns
>>> with ANNUAL mean MU_A and ANNUAL std deviation SIGMA_A, both stated in
>>> decimal.
>>> The equivalent DAILY
>>>
>>> Then he does two steps: (1) generate a matrix of random values from the
>>> N(0,1) distribution. (2) convert them to DAILY
>>> After initializing the matrix with random values (from N(0,1)), he now
>>> wants to create a series of DAILY
>>> sr_base <- 0
>>> mu_base <- sr_base/(252.0)
>>> sigma_base <- 1.00/(252.0)**0.5
>>> for ( i in 1:n ) {
>>>   m[,i] = m[,i] * sigma_base / sd(m[,i]) # re-scale
>>>   m[,i] = m[,i] + mu_base - mean(m[,i]) # re-center}
>>>
>>> On Tue, Nov 21, 2017 at 2:10 PM, Bert Gunter <bgunter.4567 at gmail.com>
>>> wrote:
>>>
>>>> Wrong list.
>>>>
>>>> Post on r-sig-finance instead.
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>>
>>>>
>>>> On Nov 20, 2017 11:25 PM, "Joe O" <joerodonnell at gmail.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I'm trying to understand how to use the pbo package by looking at a
>>>> vignette. I'm curious about a part of the vignette that creates
>>>> simulated
>>>> returns data. The package author transforms his simulated returns in a
>>>> way
>>>> that I'm unfamiliar with, and that I haven't been able to find an
>>>> explanation for after searching around. I'm curious if I need to
>>>> replicate
>>>> the transformation with real returns. For context, here is the vignette
>>>> (cleaned up a bit to make it reproducible):
>>>>
>>>> (Full vignette:
>>>> https://cran.r-project.org/web/packages/pbo/vignettes/pbo.html)
>>>>
>>>> library(pbo)
>>>> #First, we assemble the trials into an NxT matrix where each column
>>>> #represents a trial and each trial has the same length T. This example
>>>> #is random data so the backtest should be overfit.`
>>>>
>>>> set.seed(765)
>>>> n <- 100
>>>> t <- 2400
>>>> m <- data.frame(matrix(rnorm(n*t),nrow=t,ncol=n,
>>>>                        dimnames=list(1:t,1:n)), check.names=FALSE)
>>>>
>>>> sr_base <- 0
>>>> mu_base <- sr_base/(252.0)
>>>> sigma_base <- 1.00/(252.0)**0.5
>>>> for ( i in 1:n ) {
>>>>   m[,i] = m[,i] * sigma_base / sd(m[,i]) # re-scale
>>>>   m[,i] = m[,i] + mu_base - mean(m[,i]) # re-center}
>>>> #We can use any performance evaluation function that can work with the
>>>> #reassembled sub-matrices during the cross validation iterations.
>>>> #Following the original paper we can use the Sharpe ratio as
>>>>
>>>> sharpe <- function(x,rf=0.03/252) {
>>>>   sr <- apply(x,2,function(col) {
>>>>     er = col - rf
>>>>     return(mean(er)/sd(er))
>>>>   })
>>>>   return(sr)}
>>>> #Now that we have the trials matrix we can pass it to the pbo function
>>>>  #for analysis.
>>>>
>>>> my_pbo <- pbo(m,s=8,f=sharpe,threshold=0)
>>>>
>>>> summary(my_pbo)
>>>>
>>>> Here's the portion i'm curious about:
>>>>
>>>> sr_base <- 0
>>>> mu_base <- sr_base/(252.0)
>>>> sigma_base <- 1.00/(252.0)**0.5
>>>> for ( i in 1:n ) {
>>>>   m[,i] = m[,i] * sigma_base / sd(m[,i]) # re-scale
>>>>   m[,i] = m[,i] + mu_base - mean(m[,i]) # re-center}
>>>>
>>>> Why is the data transformed within the for loop, and does this kind of
>>>> re-scaling and re-centering need to be done with real returns? Or is
>>>> this
>>>> just something the author is doing to make his simulated returns look
>>>> more
>>>> like the real thing?
>>>>
>>>> Googling around turned up some articles regarding scaling volatility to
>>>> the
>>>> square root of time, but the scaling in the code here doesn't look quite
>>>> like what I've seen. Re-scalings I've seen involve multiplying some
>>>> short
>>>> term (i.e. daily) measure of volatility by the root of time, but this
>>>> isn't
>>>> quite that. Also, the documentation for the package doesn't include this
>>>> chunk of re-scaling and re-centering code. Documentation:
>>>> https://cran.r-
>>>> project.org/web/packages/pbo/pbo.pdf
>>>>
>>>> So:
>>>>
>>>>    -
>>>>
>>>>    Why is the data transformed in this way/what is result of this
>>>>    transformation?
>>>>    -
>>>>
>>>>    Is it only necessary for this simulated data, or do I need to
>>>>    similarly transform real returns?
>>>>
>>>> I read in the posting guide that stats questions are acceptable given
>>>> certain conditions, I hope this counts. Thanks for reading,
>>>>
>>>> -Joe
>>>>
>>>> <http://www.avg.com/email-signature?utm_medium=email&
>>>> utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>>>> Virus-free.
>>>> www.avg.com
>>>> <http://www.avg.com/email-signature?utm_medium=email&
>>>> utm_source=link&utm_campaign=sig-email&utm_content=webmail
>>>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>>>> >
>>>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list