[RsR] Robust location estimator - an interesting application in finance

Thu Sep 24 21:57:21 CEST 2009

One interesting application of a robust location estimator is in
computing reference rates on OTC markets. Traders on an OTC market
know the ruling price but others do not. So an information agency asks
a bunch of dealers what the price is.

Dealers typically have positions on the market and have an incentive
to lie. Hence, it's useful to have a robust location estimator. The
British Bankers Association has used a `fixed trimmed mean' where the
four most extreme observations are thrown away and the average of the
remainder is used as the `reference rate' of the market. This is the
method underlying LIBOR.

A while ago, Donald Lien and John Cita suggested that it would make
more sense to experiment with a few different levels of trimming, and
pick the one where the standard deviation of the trimmed mean
(obtained through the bootstrap) is the lowest. They termed this the
`adaptive trimmed mean' or the ATM.

One advantage of the above two ideas is that they are simple to
explain to regulators and traders.

My question is: How far can contemporary knowledge in robust
statistics improve upon this scheme? If one uses robustbase::lmrob(x ~
1) and gets a location estimator, would it be much better?

Here is some data for experimentation:

  load(url("http://www.mayin.org/ajayshah/tmp/all.rda"))

This gives you an object "all" which has 44 columns of data. Each of
these columns is one set of values obtained from a bunch of dealers.

I did:

  library(refrate)
  results <- matrix(NA, nrow=length(fileslist), ncol=4)
  colnames(results) <- c("lmrob","median","atm","mean")
  for (i in 1:length(fileslist)) {
    tmp <- na.omit(all[,i])
    a <- try(lmrob(tmp ~ 1)$coefficients)
    result <- NA
    if (class(a) != "try-error") {result <- a}
    results[i,] <- c(result,
                     median(tmp),
                     referencerate(tmp)["atm"],
                     mean(tmp))
  }
  cor(results, use="pairwise.complete.obs")

where the function referencerate() implements the Lien/Cita scheme
described above. (I can email you this code if there is interest). I
have two findings:

(a) lmrob() often breaks. It shouldn't. I have sent in one bug report.

(b) The correlation matrix shows very high correlations:

             lmrob    median       atm      mean
  lmrob  1.0000000 
  median 0.9998192 1.0000000 
  atm    0.9999741 0.9998113 1.0000000 
  mean   0.9993983 0.9994536 0.9996133 1.0000000

The correlations with the ATM are: lmrob > median > mean. So lmrob()
and the ATM seem to agree a lot.

Looking deeper, an important feature in this (financial) application
is that dealers should not see a location estimator where a small
cartel can produce a large distort the price. So their gains from
forming a cartel should be low. Would lmrob() be much different from
the ATM in this?

-- 
Ajay Shah                                      http://www.mayin.org/ajayshah  
ajayshah using mayin.org                             http://ajayshahblog.blogspot.com
<*(:-? - wizard who doesn't know the answer.