[R] Do I need to transform backtest returns before using pbo (probability of backtest overfitting) package functions?
Bert Gunter
bgunter.4567 at gmail.com
Tue Nov 21 13:10:14 CET 2017
Wrong list.
Post on rsigfinance instead.
Cheers,
Bert
On Nov 20, 2017 11:25 PM, "Joe O" <joerodonnell at gmail.com> wrote:
Hello,
I'm trying to understand how to use the pbo package by looking at a
vignette. I'm curious about a part of the vignette that creates simulated
returns data. The package author transforms his simulated returns in a way
that I'm unfamiliar with, and that I haven't been able to find an
explanation for after searching around. I'm curious if I need to replicate
the transformation with real returns. For context, here is the vignette
(cleaned up a bit to make it reproducible):
(Full vignette:
https://cran.rproject.org/web/packages/pbo/vignettes/pbo.html)
library(pbo)
#First, we assemble the trials into an NxT matrix where each column
#represents a trial and each trial has the same length T. This example
#is random data so the backtest should be overfit.`
set.seed(765)
n < 100
t < 2400
m < data.frame(matrix(rnorm(n*t),nrow=t,ncol=n,
dimnames=list(1:t,1:n)), check.names=FALSE)
sr_base < 0
mu_base < sr_base/(252.0)
sigma_base < 1.00/(252.0)**0.5
for ( i in 1:n ) {
m[,i] = m[,i] * sigma_base / sd(m[,i]) # rescale
m[,i] = m[,i] + mu_base  mean(m[,i]) # recenter}
#We can use any performance evaluation function that can work with the
#reassembled submatrices during the cross validation iterations.
#Following the original paper we can use the Sharpe ratio as
sharpe < function(x,rf=0.03/252) {
sr < apply(x,2,function(col) {
er = col  rf
return(mean(er)/sd(er))
})
return(sr)}
#Now that we have the trials matrix we can pass it to the pbo function
#for analysis.
my_pbo < pbo(m,s=8,f=sharpe,threshold=0)
summary(my_pbo)
Here's the portion i'm curious about:
sr_base < 0
mu_base < sr_base/(252.0)
sigma_base < 1.00/(252.0)**0.5
for ( i in 1:n ) {
m[,i] = m[,i] * sigma_base / sd(m[,i]) # rescale
m[,i] = m[,i] + mu_base  mean(m[,i]) # recenter}
Why is the data transformed within the for loop, and does this kind of
rescaling and recentering need to be done with real returns? Or is this
just something the author is doing to make his simulated returns look more
like the real thing?
Googling around turned up some articles regarding scaling volatility to the
square root of time, but the scaling in the code here doesn't look quite
like what I've seen. Rescalings I've seen involve multiplying some short
term (i.e. daily) measure of volatility by the root of time, but this isn't
quite that. Also, the documentation for the package doesn't include this
chunk of rescaling and recentering code. Documentation: https://cran.r
project.org/web/packages/pbo/pbo.pdf
So:

Why is the data transformed in this way/what is result of this
transformation?

Is it only necessary for this simulated data, or do I need to
similarly transform real returns?
I read in the posting guide that stats questions are acceptable given
certain conditions, I hope this counts. Thanks for reading,
Joe
<http://www.avg.com/emailsignature?utm_medium=email&
utm_source=link&utm_campaign=sigemail&utm_content=webmail>
Virusfree.
www.avg.com
<http://www.avg.com/emailsignature?utm_medium=email&
utm_source=link&utm_campaign=sigemail&utm_content=webmail>
<#DAB4FAD82DD740BBA1B84E2AA1F9FDF2>
[[alternative HTML version deleted]]
______________________________________________
Rhelp at rproject.org mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelp
PLEASE do read the posting guide http://www.Rproject.org/postingguide.html
and provide commented, minimal, selfcontained, reproducible code.
[[alternative HTML version deleted]]
More information about the Rhelp
mailing list