[RsR] robust cov for n/2 < p < n ~ 1000?

Fri Feb 2 14:06:52 CET 2007

Hi,
At risk of exposing my extreme ignorance of the subtleties of robust
methods,
something I have been wondering about is the difference between robust
estimators
and penalized or regularized estimators. For covariance estimation p>>n
the 
estimators proposed by Ledoit and Wolfe bC + (1-b)T, C being the moment
or mle estimator
and T being the shrinkage target, seem to work quite well in many
problems,
eg PCA, discriminant analysis, and factor analysis. These estimators are
biased
of course, but the gain in efficiency is large. If the target is the
identity
matrix, or some other fixed matrix, wouldn't that have a similar effect
to what a robust
estimator is doing, moderating the effect of extreme or unusual values
that make the standard estimators
unstable. The package corpcor has an implementation for the identity
target that is easy to modify
for other targets. 

Regards
Nicholas

> ------------------------------
> 
> Message: 2
> Date: Thu, 1 Feb 2007 19:22:28 +0100
> From: Martin Maechler <maechler using stat.math.ethz.ch>
> Subject: [RsR] robust cov for   n/2 < p < n ~ 1000?
> To: r-sig-robust using r-project.org
> Message-ID: <200702011822.l11IMSuk026054 using lynne.ethz.ch>
> 
> When reading the paper on LIBRA (2004),
> I stumbled (once more) about the fact that there seems to be
> missing functionality for the ``in between case'' of
> "low" and "high" dimensional data.
> 
> For "low dimensional", we have MCD (and similar algorithms)
> requiring p < n/2, and the authors recommend even p < n/5 for
> covMCD(alpha = 1/2).
> For "high dimensional", p > n, the use of robust PCA (and
> extensions) is recommended.
> 
> For a situation with p = 0.70 n (and think of n = 1000),
> we currently have  covOGK() {in different versions, using
> different 1d-scale functions etc}, but that's quite slow for the
> situation above.  What do people do here?
> Is it just a matter of making an implementation covOGK() which
> is optimized for speed i.e. by computing in C instead of R code ?
> 
> I now think that the "Maronna Method" (said to be based on
> Maronna(1976),Annals) may even work faster {than OGK versions}
> and even more probably the quadrant correlation
> because one only needs p median+MAD and not choose(p,2) ones, there.
> Does anyone have experience or "here-say evidence" or recommendations?
> I think some versions of these need to go into robustbase.
> 
> BTW: It seems S-plus library "robust" has now been ported to an
>      R package "robust", mostly --- though accompanied with a
>      peculiar Insightful licence.
> There's also the "pairwise quadrant correlation"
> which is I think a cheap version of the OGK
> 
> Martin
> 
>