[R-SIG-Finance] [RsR] Outliers in the market model that's used to estimate `beta' of a stock

Thu Sep 18 19:47:29 CEST 2008

  Hi Matias: yes, he wasn't dissing statistics for the most part. He was 
definitely talking about the miuses also but I think he was claiming
that models , be it statistics, physics and even non quant models  in 
finance are kind of assumed to be right until they don't work.

That's true in all science but it puts finance on quite shaky ground 
because there are people trading serious money based on the idea that 
what they are doing is valid and working  correctly. This is obviously 
kind of relevant  to what's going on right now. Thanks for your 
references also.

mark

On Thu, Sep 18, 2008 at  1:29 PM, Matias Salibian-Barrera wrote:

> Haven't read "Fooled by randomness", but did start reading Black Swan, 
> and although in general I like provocative books that challenge my 
> points of view, I found his main thesis to be too short to warrant so 
> many words... I took it that his main argument was with those who 
> misinterpret and misuse statistics (particularly when they do it for 
> their own benefit), not with statistics itself, which is always based 
> on assumptions etc.
>
>> [snip] In fact, he would say that a model works until it doesn't.
>
> Which is a fair statement, that also applies to science in general, 
> "theories work until they are proved wrong", and the whole 
> "falsifiability" argument (cf. Popper vs. Kuhn vs. Feyerabend vs...). 
> I believe robust statistics can help you determine when your model 
> (theory) has stopped to work.
>
> In any case, with respect to the old "data cleaning versus robust 
> estimators" discussion, I would point the interest reader to the first 
> chapter of Maronna, Martin and Yohai's book 
> (http://books.google.com/books?id=YD--AAAACAAJ&dq=martin+maronna+yohai), 
> and for some more specific inference implications, to the first 
> chapter of my PhD dissertation. Essentially, a couple of main issues 
> are: (a) detecting outliers using non-robust estimators does not work 
> well in general (but even if / when it does, see my next point); (b) 
> if you remove (or alter) observations, all subsequent probabilistic 
> statements (p-values, standard errors, etc) are all conditional on the 
> very non-linear cleaning operation you did, and thus both wrong at 
> face value, and not easy to correct. Robust estimators incorporate the 
> down-weighting and its effect on the corresponding inference at once, 
> and are thus, IMHO, to be preferred.
>
> Matias
>
> markleeds at verizon.net wrote:
>> Hi: i don't know if you read "fooled by randomness" by Nassim Taleb ( 
>> spelling )  but he essentially says using very non statistical 
>> arguments but
>> strong nevertheless. ( it's not a stat or a quant finance book ) 
>> that outliers in finance are not modellable and don't claim that you 
>> can model
>> them because you'd be lying. In fact, he would say that a model works 
>> until it doesn't.
>>
>> Anyway, it's an interesting book that sort of indirectly talks ( for 
>> a little too long actually. you can get what's he saying in the first 
>> 50 pages and
>>  it's about 200 pages )  about your comment below so I figured I 
>> would just mention it in case you were interested.
>>
>>
>> On Thu, Sep 18, 2008 at 11:36 AM, Ajay Shah wrote:
>>
>>> In continuation of the discussion on `Winsorisation' that has taken
>>> place on r-sig-finance today, I thought I'd present all of you with 
>>> an
>>> interesting dataset and a question.
>>>
>>> This data is the daily stock returns of the large Indian software 
>>> firm
>>> `Infosys'. (This is the symbol `INFY' on NASDAQ). It is a large 
>>> number
>>> of observations of daily returns (i.e. percentage changes of the
>>> adjusted stock price).
>>>
>>> Load the data in --
>>>
>>>
>>> print(load(url("http://www.mayin.org/ajayshah/tmp/infosys_mm.rda")))
>>>     str(x)
>>>     summary(x)
>>>     sd(x)
>>>
>>> The name `rj' is used for returns on Infosys, and `rM' is used for
>>> returns on the stock market index (Nifty). There are three really
>>> weird observations in this.
>>>
>>>     weird.rj <- c(1896,2395)
>>>     weird.rM <- 2672
>>>     x[weird.rj,]
>>>     x[weird.rM,]
>>>
>>> As you can see, these observations are quite remarkable given the
>>> small standard deviations that we saw above. There is absolutely no
>>> measurement error here. These things actually happened.
>>>
>>> Now consider a typical application: using this to estimate a market
>>> model. The goal here is to estimate the coefficient of a regression 
>>> of
>>> rj on rM.
>>>
>>>     # A regression with all obs
>>>     summary(lm(rj ~ rM, data=x))
>>>
>>>     # Drop the weird rj --
>>>     summary(lm(rj ~ rM, data=x[-weird.rj,]))
>>>
>>>     # Drop the weird rM --
>>>     summary(lm(rj ~ rM, data=x[-weird.rM,]))
>>>
>>>     # Drop both kinds of weird observations --
>>>     summary(lm(rj ~ rM, data=x[-c(weird.rM,weird.rj),]))
>>>
>>>     # Robust regressions
>>>     library(MASS)
>>>     summary(rlm(rj ~ rM, data=x))
>>>     summary(rlm(rj ~ rM, method="MM", data=x))
>>>     library(robust)
>>>     summary(lmRob(rj ~ rM, data=x))
>>>     library(quantreg)
>>>     summary(rq(rj ~ rM, tau=0.5, data=x))
>>>
>>> So you see, we have a variety of different estimates for the slope
>>> (which is termed `beta' in finance). What value would you trust the
>>> most?
>>>
>>> And, would winsorisation using either my code
>>> (https://stat.ethz.ch/pipermail/r-sig-finance/2008q3/002921.html) or
>>> Patrick Burns' code
>>> (https://stat.ethz.ch/pipermail/r-sig-finance/2008q3/002923.html) be 
>>> a
>>> good idea here?
>>>
>>> I'm instinctively unhappy with any scheme based on discarding
>>> observations that I'm absolutely sure have no measurement error. We
>>> have to model the weirdness of this data generating process, not
>>> ignore it.
>>>
>>> -- 
>>> Ajay Shah http://www.mayin.org/ajayshah  ajayshah at mayin.org 
>>> http://ajayshahblog.blogspot.com
>>> <*(:-? - wizard who doesn't know the answer.
>>>
>>> _______________________________________________
>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only.
>>> -- If you want to post, subscribe first.
>>
>> _______________________________________________
>> R-SIG-Robust at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
>
> -- 
> _____________________________________________________
> Matias Salibian-Barrera - Department of Statistics
> The University of British Columbia
> Phone: (604) 822-3410 - Fax: (604) 822-6960
> "The plural of anecdote is not data" (George Stigler?)