[RsR] port of LIBRA toolbox to R

Hulliger Beat be@t@hu|||ger @end|ng |rom |hnw@ch
Tue Feb 6 14:28:11 CET 2007


Dear colleagues

As a newcomer to this discussion list I am not sure where you are aware of the algorithms we (Cédric Béguin and Beat Hulliger of Swiss Federal Statistical Office at that time) developed in the EUREDIT project 2000-2003. I cannot pretend to know all the R-packages for robust covariances, so please correct me if I am mistaken.

Transformed Rank Correlation (TRC)(Béguin and Hulliger, JRSS A, 2004) is a one-step version of OGK with spearman rank correlations as starting point. It has been developed independently of the work by Maronna and Zamar. For small data sets it may not be very important whether you do just one iteration or iterate to convergence. The point that may be interesting is that TRC is adapted to work with sampling weights and has a built in ad-hoc imputation for missing values (something necessary for the orthogonalisation step). TRC was first implemented in S-Plus then ported to SAS and later to R.

The BACON-EEM algorithm is the second algorithm we developed in EUREDIT. It uses an EM-algorithm at each step of the BACON algorithm to cope with missing values. In addition the covariance matrices may use sampling weights and the EM-algorithm is adapted to estimate the sufficient statistics at the population level (not only sample level).

I just recently "abused" the R-code of the BACON-EEM algorithm to implement the ER-algorithm by Little and Smith (JASA 1987). ER uses a one-step M-estimator at each iteration of the EM-algorithm. 

In the EUREDIT project Ray Chambers and his colleagues also developed a robust tree algorithm called WAID and at SFSO we developed the Epidemic Algorithm, more related to data depth, for outlier detection. WAID and EA are also available as R-functions but may not be interesting for your present discussion since they do not use the covariance matrix.  

It may well be that R-professionals like the ones on this list will find our R-implementations badly programmed. TRC, BACON-EEM, ER and EA are at your disposal anyway. For WAID you would have to contact Ray Chambers.

Best regards
Beat

-------------------------------------------------------
Fachhochschule Nordwestschweiz FHNW / 
University of Applied Sciences Northwestern Switzerland
Hochschule für Wirtschaft HSW / 
School of Business
Institut für Kommunikation und Marketing ICC / 
Institute for Communication and Competitiveness

Prof. Dr. Beat Hulliger
Riggenbachstrasse 16
4600 Olten
Schweiz / Switzerland
-------------------------------------------------------
T +41 62 286 0158
F +41 62 286 0090
beat.hulliger using fhnw.ch
www.fhnw.ch
-------------------------------------------------------





-----Ursprüngliche Nachricht-----
Von: r-sig-robust-bounces using r-project.org [mailto:r-sig-robust-bounces using r-project.org] Im Auftrag von Martin Maechler
Gesendet: Dienstag, 6. Februar 2007 10:23
An: valentin.todorov using chello.at
Cc: r-sig-robust using r-project.org
Betreff: Re: [RsR] port of LIBRA toolbox to R

>>>>> "ValenT" == Valentin Todorov <valentin.to using gmail.com>
>>>>>     on Mon, 5 Feb 2007 18:49:14 +0100 writes:

    ValenT> (just a short comment, to show that we are still alive, although quite)

:-) thank you, Valentin

    [........]

    ValenT> Also, I do not remember if Martin mentioned this,
    ValenT> the cluster analysis methods of LIBRA are included
    ValenT> in the recommended R package 'cluster', again in
    ValenT> native C and FORTRAN code.

I didn't mention it because it's not directly related to robustness and also because I'm the maintainer of the 'cluster'
package. Since you mention it, let me say that indeed 'cluster'
has been *considerably* enhanced (and de-bugged) from the original S-plus & Fortran code of Kaufman,Rousseeuw et al.

    ValenT> I have implementations of linear discriminant analysis (using not only
    ValenT> MCD, but also OGK M and S estimates, see the above presentation) as
    ValenT> well as of PCA, based on projection pursuit and MCD (ala ROBPCA) which
    ValenT> I intend to release soon in rrcov.

good!

    ValenT> By the way, Martin when do you expect to deliver robustbase as a
    ValenT> recommended package? 

Well, that's not my decision! It's the R-core team's decision about which - very few - R packages become recommended and are "packed with the R distribution"; so this will not simply happen.
If continue to aim having robustbase in a a formidable state within this year (say) -- not the least thanks to the contributions from people on this mailing list! -- there will at least be more good reasons to consider a 'recommended' state.

But note that this is not really necessary:
Your package rrcov (or many others) can simply have a 'Depends: robustbase' in their DESCRIPTION file [and probably an 'Imports: robustbase' with e.g. Import("robustbase") in NAMESPACE] such that people who install rrcov automatically also install robustbase (if they install via the windows- or mac-GUI, or use
  install.packages(....., dependencies = TRUE) or work in R version 2.5.0 [to be released in 2 months] or newer where the default is 'dependencies = NA' which installs such dependencies typically.

    ValenT> I still have not released a version of rrcov without
    ValenT> covMcd, ltsReg and related functions and requiring robustbase.

(and probably without those datasets that I've ported into  robustbase).

Yes, I've remarked that and had been wondering a bit about the reason.  I'd be glad if you'd made  rrcov depend on robustbase.
Having (many) other packages depend on one package makes that ``recommended'' at least informally..

Best regards,
Martin

_______________________________________________
R-SIG-Robust using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-robust




More information about the R-SIG-Robust mailing list