[RsR] Distribution of robust distances
Valentin Todorov
v@|ent|n@todorov @end|ng |rom che||o@@t
Tue Aug 28 23:52:34 CEST 2007
Dear Harry,
Thank you very much for this question , which is an important issue arising
now and then in different forms but almost allways meaning "Why are
different the results of the different MCD implementations?". And the answer
is almost always "Because of the different consistency and small sample
corrections factors used".
There are several problems in your case (and Kevin Wright's):
1. I assume you are using cov.rob() or cov.mcd() from MASS for computing the
MCD estimator (as seen in the code of Kevin). These functions return the
reweighted MCD covariance matrix, while the results in the paper of Hardin
and Rocke are for the raw MCD. There is an MCD program at the Jo Hardin's
web page for computing the MCD, which is a straightforward implementation of
FAST-MCD in R without partitioning, nesting, reweighting and correcting,
which they used for performing the computation. With this program you could
reproduce the results but unfortunately it is very slow compared to the
implementations in native Fortran or C code (like these in MASS,
rrcov/robustbase). Here is the link:
http://pages.pomona.edu/~jsh04747/Research/mcd.est.r
2. If you use covMcd{robustbase} or CovMcd{rrcov} instead, you can: (i)
switch off the correction factors and (ii) take the raw estimates.
3. Youd do not need to estimate c (page 19) since it is already applied in
covMcd() - the covariance matrix was devided by c, i.e. you are multiplyung
twise the distances by this factor.
In summary: using the raw estimates from covMcd() called with
use.correction=FALSE and setting c=1 in the code of Kevin will reproduce the
results.
covResult <- covMcd(x, use.correction=FALSE)
T <- covResult$raw.center
C <- covResult$raw.cov
c <- 1
.
m <- .
.
I'll try to find the code of the simulations I did some time ago and will
post it in the next days.
Hope this helps,
Best regards,
Valentin
----- Original Message -----
From: "Southworth, Harry" <Harry.Southworth using astrazeneca.com>
To: <R-SIG-robust using stat.math.ethz.ch>
Sent: Tuesday, August 28, 2007 4:21 PM
Subject: [RsR] Distribution of robust distances
> Hello.
>
> Has anyone implemented anything to compute quantiles of the distribution
> of robust distances following Hardin & Rocke
> (http://dmrocke.ucdavis.edu/papers/HardinRocke2005.pdf)?
>
> I've got a function to do it, but I can't reproduce the results of
> Hardin & Rock because my function is returning values that are too high.
> Searching the R help archive, I found a message from 2004 describing the
> same problem(http://tolstoy.newcastle.edu.au/R/help/04/05/1296.html).
> The code in that message is essentially similar to mine (except that he
> uses a 1 - h/n that I think should be h/n).
>
> I'd be grateful of any pointers.
>
> Thanks,
> Harry
>
> _______________________________________________
> R-SIG-Robust using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
>
More information about the R-SIG-Robust
mailing list