[RsR] Outliers in the market model that's used to estimate `beta' of a stock

Thu Sep 18 22:48:01 CEST 2008

As a complement of this discussion, I would like to bring to your 
attention a paper by Marc Genton and Elvezio Ronchetti entitled "Robust 
Prediction of Beta", available from the webpage of M. Genton
http://www.unige.ch/ses/metri/genton/publications.html

Abstract:
The estimation of "beta" plays a basic role in the evaluation of 
expected return and market risk. Typically this is performed by ordinary 
least squares (OLS). To cope with the high sensitivity of OLS to 
outlying observations and to deviations from the normality assumptions, 
several methods suggest to use robust estimators. It is argued that, 
from a predictive point of view, the simple use of either OLS or
robust estimators is not sufficient but that some shrinking of the 
robust estimators toward OLS is necessary to reduce the mean squared 
error. The performance of the proposed shrinkage robust estimator is 
shown by means of a small simulation study and on a real data set.

Best regards,
Eva

Ajay Shah wrote:
> In continuation of the discussion on `Winsorisation' that has taken
> place on r-sig-finance today, I thought I'd present all of you with an
> interesting dataset and a question.
> 
> This data is the daily stock returns of the large Indian software firm
> `Infosys'. (This is the symbol `INFY' on NASDAQ). It is a large number
> of observations of daily returns (i.e. percentage changes of the
> adjusted stock price).
> 
> Load the data in --
> 
>     print(load(url("http://www.mayin.org/ajayshah/tmp/infosys_mm.rda")))
>     str(x)
>     summary(x)
>     sd(x)
> 
> The name `rj' is used for returns on Infosys, and `rM' is used for
> returns on the stock market index (Nifty). There are three really
> weird observations in this.
> 
>     weird.rj <- c(1896,2395)
>     weird.rM <- 2672
>     x[weird.rj,]
>     x[weird.rM,]
> 
> As you can see, these observations are quite remarkable given the
> small standard deviations that we saw above. There is absolutely no
> measurement error here. These things actually happened.
> 
> Now consider a typical application: using this to estimate a market
> model. The goal here is to estimate the coefficient of a regression of
> rj on rM.
> 
>     # A regression with all obs
>     summary(lm(rj ~ rM, data=x))
> 
>     # Drop the weird rj --
>     summary(lm(rj ~ rM, data=x[-weird.rj,]))
> 
>     # Drop the weird rM --
>     summary(lm(rj ~ rM, data=x[-weird.rM,]))
> 
>     # Drop both kinds of weird observations --
>     summary(lm(rj ~ rM, data=x[-c(weird.rM,weird.rj),]))
> 
>     # Robust regressions
>     library(MASS)
>     summary(rlm(rj ~ rM, data=x))
>     summary(rlm(rj ~ rM, method="MM", data=x))
>     library(robust)
>     summary(lmRob(rj ~ rM, data=x))
>     library(quantreg)
>     summary(rq(rj ~ rM, tau=0.5, data=x))
> 
> So you see, we have a variety of different estimates for the slope
> (which is termed `beta' in finance). What value would you trust the
> most?
> 
> And, would winsorisation using either my code
> (https://stat.ethz.ch/pipermail/r-sig-finance/2008q3/002921.html) or
> Patrick Burns' code
> (https://stat.ethz.ch/pipermail/r-sig-finance/2008q3/002923.html) be a
> good idea here?
> 
> I'm instinctively unhappy with any scheme based on discarding
> observations that I'm absolutely sure have no measurement error. We
> have to model the weirdness of this data generating process, not
> ignore it.
>