[R-sig-Finance] Confidence intervals for spread returns

Mon Jun 26 02:21:15 CEST 2006

Hello Mr. Kane,

My own )very limited) experience suggests that there are two large
sources of error here that are not adequately considered if one were
to come up with confidence intervals. On a purely statistical basis,
the confidence interval can be calculated by using boot stapping on
the realized returns, either by just taking the quarterly returns
(being careful not to count annual returns but considering them each
quarter -- an error I have often seen) or by somehow combining/
randomizing the realized returns of each observed decile over time.
However, I think the key to the answer is in considering how the
results will be used.

1. Suppose one were to do boot strapping, then the confidence interval
would suffer greatly from potential survival, reporting, backfill,
etc. bias because these often tend to be concentrated in the extreme
deciles. And estimation error of the confidence interval would itself
be quite large because even in US markets we will quickly run out of
data. Now sensible people obviously control for these things, but once
the results are comitted to a summary statistic like "the 95%
confidence interval for the hedged return is 5% to 10%", then all that
the bosses of people like me will remember will be that summary, just
like all that anyone remembers of the BHB asset allocation study is
that "93.6% of the variation" statistic. The streakiness
(autocorrelation) of the results, the  fact that the hedged return
itself is not normally distributed, any trend in the hedged return (is
the factor being arbed away?), the potential error in the confidence
interval itself, etc. is all forgotten.

2. The influence of transaction costs, the speed of execution, messed
up incentives, etc. don't appear to me to be second-order effects. In
fact, it may be an entirely defensible position that the reason most
of the factors continue to be useful alpha generators is because of
these factors -- and as such, the confidence interval will be
dependent on the transaction details (e.g. if the boundaries of the
confidence interval are highly sensitive to the transaction cost
assumption (which I suspect that they will be), then stating a
confidence interval would end up being misleading to the user of the
information.

Now, I know that you did not ask about either of the above things in
your original question. But I am trying to assert that the user of the
information may rely on the confidence interval entirely too much
given the HUGE potential for errors. Somehow, that fact should be
conveyed.

I just got done reading the Edward Tufte books again, and I think it
might be a better goal for your (I should say "our", since I have been
meaning to get involved since recovering from my illness) project
would be create a dasboad view of the performance of the portfolio
simulation, rather than reducing it to a single number or tuple.
People will still tend to compare the summary statistics, but atleast
we would not be guilty of reporting the statitic in a way that hides
"the truth".

With warm regards,

Vivek Satsangi

p.s. While we are on the topic of ways to conduct the search for
alpha, does anyone have recommendations on book like the Tufte books
on how to use statistical graphics for pattern recognition (rather
than to convey information or already identified patterns, which is
what the Tufte books cover nicely). For example, I am hoping that the
books people suggest would answer the question, "what would be a good
way to construct, and things to consider while designing, graphics
that help us rapidly identify useful factors vs. lousy ones?"
(wouldn't *everyone* like to know the answer to that :-) ).

David Kane wote:
> Question: How might one calculate a reasonable confidence interval
> around this 8% spread return?