[R-SIG-Finance] Outliers in the market model that's used to estimate `beta' of a stock

Thu Sep 18 18:58:02 CEST 2008

Hi: i don't know if you read "fooled by randomness" by Nassim Taleb ( 
spelling )  but he essentially says using very non statistical arguments 
but
strong nevertheless. ( it's not a stat or a quant finance book )  that 
outliers in finance are not modellable and don't claim that you can 
model
them because you'd be lying. In fact, he would say that a model works 
until it doesn't.

Anyway, it's an interesting book that sort of indirectly talks ( for a 
little too long actually. you can get what's he saying in the first 50 
pages and
  it's about 200 pages )  about your comment below so I figured I would 
just mention it in case you were interested.

On Thu, Sep 18, 2008 at 11:36 AM, Ajay Shah wrote:

> In continuation of the discussion on `Winsorisation' that has taken
> place on r-sig-finance today, I thought I'd present all of you with an
> interesting dataset and a question.
>
> This data is the daily stock returns of the large Indian software firm
> `Infosys'. (This is the symbol `INFY' on NASDAQ). It is a large number
> of observations of daily returns (i.e. percentage changes of the
> adjusted stock price).
>
> Load the data in --
>
> 
> print(load(url("http://www.mayin.org/ajayshah/tmp/infosys_mm.rda")))
>     str(x)
>     summary(x)
>     sd(x)
>
> The name `rj' is used for returns on Infosys, and `rM' is used for
> returns on the stock market index (Nifty). There are three really
> weird observations in this.
>
>     weird.rj <- c(1896,2395)
>     weird.rM <- 2672
>     x[weird.rj,]
>     x[weird.rM,]
>
> As you can see, these observations are quite remarkable given the
> small standard deviations that we saw above. There is absolutely no
> measurement error here. These things actually happened.
>
> Now consider a typical application: using this to estimate a market
> model. The goal here is to estimate the coefficient of a regression of
> rj on rM.
>
>     # A regression with all obs
>     summary(lm(rj ~ rM, data=x))
>
>     # Drop the weird rj --
>     summary(lm(rj ~ rM, data=x[-weird.rj,]))
>
>     # Drop the weird rM --
>     summary(lm(rj ~ rM, data=x[-weird.rM,]))
>
>     # Drop both kinds of weird observations --
>     summary(lm(rj ~ rM, data=x[-c(weird.rM,weird.rj),]))
>
>     # Robust regressions
>     library(MASS)
>     summary(rlm(rj ~ rM, data=x))
>     summary(rlm(rj ~ rM, method="MM", data=x))
>     library(robust)
>     summary(lmRob(rj ~ rM, data=x))
>     library(quantreg)
>     summary(rq(rj ~ rM, tau=0.5, data=x))
>
> So you see, we have a variety of different estimates for the slope
> (which is termed `beta' in finance). What value would you trust the
> most?
>
> And, would winsorisation using either my code
> (https://stat.ethz.ch/pipermail/r-sig-finance/2008q3/002921.html) or
> Patrick Burns' code
> (https://stat.ethz.ch/pipermail/r-sig-finance/2008q3/002923.html) be a
> good idea here?
>
> I'm instinctively unhappy with any scheme based on discarding
> observations that I'm absolutely sure have no measurement error. We
> have to model the weirdness of this data generating process, not
> ignore it.
>
> -- 
> Ajay Shah 
> http://www.mayin.org/ajayshah  ajayshah at mayin.org 
> http://ajayshahblog.blogspot.com
> <*(:-? - wizard who doesn't know the answer.
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.