[RsR] Robust location estimator - an interesting application in finance

Thu Sep 24 23:31:42 CEST 2009

Very interesting post Ajay. I will think about it in more detail in the 
coming days. Meanwhile, I believe I can offer a possible explanation for 
the "failures" of lmrob(). They may be related to the fact that some 
columns of the object "all" have more than half of their observations 
identical to each other.

For example, if I do:

 > load(url("http://www.mayin.org/ajayshah/tmp/all.rda"))
 > mad(na.omit(all[,11]))
[1] 0

The same goes for columns 12, 17, 18, 23 and 24.

What happens in these samples is that the majority of the data don't 
have any variability and thus a robust ("majority rules") scale 
estimator is zero. This breaks the calculations for the estimated 
correlation matrix of the regression estimators (the code is probably 
trying to divide by zero). I agree this should be fixed in the code and 
the output re-arranged accordingly. I'll work on it.

Now to the more interesting parts of your e-mail...

Matias

Ajay Shah wrote:
> One interesting application of a robust location estimator is in
> computing reference rates on OTC markets. Traders on an OTC market
> know the ruling price but others do not. So an information agency asks
> a bunch of dealers what the price is.
> 
> Dealers typically have positions on the market and have an incentive
> to lie. Hence, it's useful to have a robust location estimator. The
> British Bankers Association has used a `fixed trimmed mean' where the
> four most extreme observations are thrown away and the average of the
> remainder is used as the `reference rate' of the market. This is the
> method underlying LIBOR.
> 
> A while ago, Donald Lien and John Cita suggested that it would make
> more sense to experiment with a few different levels of trimming, and
> pick the one where the standard deviation of the trimmed mean
> (obtained through the bootstrap) is the lowest. They termed this the
> `adaptive trimmed mean' or the ATM.
> 
> One advantage of the above two ideas is that they are simple to
> explain to regulators and traders.
> 
> My question is: How far can contemporary knowledge in robust
> statistics improve upon this scheme? If one uses robustbase::lmrob(x ~
> 1) and gets a location estimator, would it be much better?
> 
> Here is some data for experimentation:
> 
>   load(url("http://www.mayin.org/ajayshah/tmp/all.rda"))
> 
> This gives you an object "all" which has 44 columns of data. Each of
> these columns is one set of values obtained from a bunch of dealers.
> 
> I did:
> 
>   library(refrate)
>   results <- matrix(NA, nrow=length(fileslist), ncol=4)
>   colnames(results) <- c("lmrob","median","atm","mean")
>   for (i in 1:length(fileslist)) {
>     tmp <- na.omit(all[,i])
>     a <- try(lmrob(tmp ~ 1)$coefficients)
>     result <- NA
>     if (class(a) != "try-error") {result <- a}
>     results[i,] <- c(result,
>                      median(tmp),
>                      referencerate(tmp)["atm"],
>                      mean(tmp))
>   }
>   cor(results, use="pairwise.complete.obs")
> 
> where the function referencerate() implements the Lien/Cita scheme
> described above. (I can email you this code if there is interest). I
> have two findings:
> 
> (a) lmrob() often breaks. It shouldn't. I have sent in one bug report.
> 
> (b) The correlation matrix shows very high correlations:
> 
>              lmrob    median       atm      mean
>   lmrob  1.0000000 
>   median 0.9998192 1.0000000 
>   atm    0.9999741 0.9998113 1.0000000 
>   mean   0.9993983 0.9994536 0.9996133 1.0000000
> 
> The correlations with the ATM are: lmrob > median > mean. So lmrob()
> and the ATM seem to agree a lot.
> 
> Looking deeper, an important feature in this (financial) application
> is that dealers should not see a location estimator where a small
> cartel can produce a large distort the price. So their gains from
> forming a cartel should be low. Would lmrob() be much different from
> the ATM in this?
>