[RsR] [R-SIG-Finance] Outliers in the market model that's used to estimate `beta' of a stock

Thu Sep 18 19:29:06 CEST 2008

Haven't read "Fooled by randomness", but did start reading Black Swan, 
and although in general I like provocative books that challenge my 
points of view, I found his main thesis to be too short to warrant so 
many words... I took it that his main argument was with those who 
misinterpret and misuse statistics (particularly when they do it for 
their own benefit), not with statistics itself, which is always based on 
assumptions etc.

> [snip] In fact, he would say that a model works 
> until it doesn't.

Which is a fair statement, that also applies to science in general, 
"theories work until they are proved wrong", and the whole 
"falsifiability" argument (cf. Popper vs. Kuhn vs. Feyerabend vs...). I 
believe robust statistics can help you determine when your model 
(theory) has stopped to work.

In any case, with respect to the old "data cleaning versus robust 
estimators" discussion, I would point the interest reader to the first 
chapter of Maronna, Martin and Yohai's book 
(http://books.google.com/books?id=YD--AAAACAAJ&dq=martin+maronna+yohai), 
and for some more specific inference implications, to the first chapter 
of my PhD dissertation. Essentially, a couple of main issues are: (a) 
detecting outliers using non-robust estimators does not work well in 
general (but even if / when it does, see my next point); (b) if you 
remove (or alter) observations, all subsequent probabilistic statements 
(p-values, standard errors, etc) are all conditional on the very 
non-linear cleaning operation you did, and thus both wrong at face 
value, and not easy to correct. Robust estimators incorporate the 
down-weighting and its effect on the corresponding inference at once, 
and are thus, IMHO, to be preferred.

Matias

markleeds using verizon.net wrote:
> Hi: i don't know if you read "fooled by randomness" by Nassim Taleb ( 
> spelling )  but he essentially says using very non statistical arguments 
> but
> strong nevertheless. ( it's not a stat or a quant finance book )  that 
> outliers in finance are not modellable and don't claim that you can model
> them because you'd be lying. In fact, he would say that a model works 
> until it doesn't.
> 
> Anyway, it's an interesting book that sort of indirectly talks ( for a 
> little too long actually. you can get what's he saying in the first 50 
> pages and
>  it's about 200 pages )  about your comment below so I figured I would 
> just mention it in case you were interested.
> 
> 
> On Thu, Sep 18, 2008 at 11:36 AM, Ajay Shah wrote:
> 
>> In continuation of the discussion on `Winsorisation' that has taken
>> place on r-sig-finance today, I thought I'd present all of you with an
>> interesting dataset and a question.
>>
>> This data is the daily stock returns of the large Indian software firm
>> `Infosys'. (This is the symbol `INFY' on NASDAQ). It is a large number
>> of observations of daily returns (i.e. percentage changes of the
>> adjusted stock price).
>>
>> Load the data in --
>>
>>
>> print(load(url("http://www.mayin.org/ajayshah/tmp/infosys_mm.rda")))
>>     str(x)
>>     summary(x)
>>     sd(x)
>>
>> The name `rj' is used for returns on Infosys, and `rM' is used for
>> returns on the stock market index (Nifty). There are three really
>> weird observations in this.
>>
>>     weird.rj <- c(1896,2395)
>>     weird.rM <- 2672
>>     x[weird.rj,]
>>     x[weird.rM,]
>>
>> As you can see, these observations are quite remarkable given the
>> small standard deviations that we saw above. There is absolutely no
>> measurement error here. These things actually happened.
>>
>> Now consider a typical application: using this to estimate a market
>> model. The goal here is to estimate the coefficient of a regression of
>> rj on rM.
>>
>>     # A regression with all obs
>>     summary(lm(rj ~ rM, data=x))
>>
>>     # Drop the weird rj --
>>     summary(lm(rj ~ rM, data=x[-weird.rj,]))
>>
>>     # Drop the weird rM --
>>     summary(lm(rj ~ rM, data=x[-weird.rM,]))
>>
>>     # Drop both kinds of weird observations --
>>     summary(lm(rj ~ rM, data=x[-c(weird.rM,weird.rj),]))
>>
>>     # Robust regressions
>>     library(MASS)
>>     summary(rlm(rj ~ rM, data=x))
>>     summary(rlm(rj ~ rM, method="MM", data=x))
>>     library(robust)
>>     summary(lmRob(rj ~ rM, data=x))
>>     library(quantreg)
>>     summary(rq(rj ~ rM, tau=0.5, data=x))
>>
>> So you see, we have a variety of different estimates for the slope
>> (which is termed `beta' in finance). What value would you trust the
>> most?
>>
>> And, would winsorisation using either my code
>> (https://stat.ethz.ch/pipermail/r-sig-finance/2008q3/002921.html) or
>> Patrick Burns' code
>> (https://stat.ethz.ch/pipermail/r-sig-finance/2008q3/002923.html) be a
>> good idea here?
>>
>> I'm instinctively unhappy with any scheme based on discarding
>> observations that I'm absolutely sure have no measurement error. We
>> have to model the weirdness of this data generating process, not
>> ignore it.
>>
>> -- 
>> Ajay Shah http://www.mayin.org/ajayshah  ajayshah using mayin.org 
>> http://ajayshahblog.blogspot.com
>> <*(:-? - wizard who doesn't know the answer.
>>
>> _______________________________________________
>> R-SIG-Finance using stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only.
>> -- If you want to post, subscribe first.
> 
> _______________________________________________
> R-SIG-Robust using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-robust

-- 
_____________________________________________________
Matias Salibian-Barrera - Department of Statistics
The University of British Columbia
Phone: (604) 822-3410 - Fax: (604) 822-6960
"The plural of anecdote is not data" (George Stigler?)