[RsR] Package 'robustbase' available from CRAN

Martin Maechler maechler at stat.math.ethz.ch
Fri Feb 10 14:14:14 CET 2006

Dear "robust R users",

Yesterday, I have uploaded the first public version (0.1-2) of
the 'robustbase' package to CRAN.  It has appeared on the CRAN
master (in Vienna) and will propagate to the mirror sites within
a few days at most. For Windows users, the *binary* version of
the package will also hopefully become available till after the

Since I'll be vacationing for a bit more than a week (and then
be busy with other duties), I've wanted to make this available
in time, for your feedback and usage.
Currently,  packageDescription("robustbase") contains


for good reasons.  This is also remarked on some of the help
pages (but not on all where it should be!)


 1) Many data sets, particularly from the book of Rousseeuw &
  Leroy, mostly thanks to Valentin Todorov; all of them with
  nice help pages :

     > data(package = "robustbase")
     Data sets in package $,1rx(Brobustbase$,1ry(B:

     Animals2              Brain and Body Weights for 65 Species of
			   Land Animals
     SiegelsEx             Siegel's Exact Fit Example Data
     aircraft              Aircraft Data
     airmay                Air Quality Data
     bushfire              Campbell Bushfire Data
     carrots               Insect Damages on Carrots
     cloud                 Cloud point of a Liquid
     coleman               Coleman Data Set
     delivery              Delivery Time Data
     education             Education Expenditure Data
     epilepsy              Epilepsy Attacks Data Set
     exAM                  Example Data of Antille and May - for Simple
     hbk                   Hawkins, Bradu, Kass's Artificial Data
     heart                 Heart Catherization Data
     lactic                Lactic Acid Concentration Measurement Data
     milk                  Daudin's Milk Composition Data
     pension               Pension Funds Data
     phosphor              Phosphorus Content Data
     pilot                 Pilot-Plant Data
     salinity              Salinity Data
     starsCYG              Hertzsprung-Russell Diagram Data of Star
			   Cluster CYG OB1
     telef                 Number of International Calls from Belgium
     vaso                  Vaso Constriction Data Set
     wood                  Modified Data on Wood Specific Gravity

  I strongly recommend that you start using these datasets in
  your packages and scripts even if you don't use 'robustbase'
  there yet.  
  To use a dataset from another package, you don't need to
  attach nor load that package; it's sufficient to say, e.g.,

     data(wood, package = "robustbase") 

 2) covMcd() and covLts()  as they were in a version of
    Valentin's "rrcov" package several weeks ago; these are
    slightly older than in the very newest version of 'rrcov'.
    However, that will probably change with next release of
    'robustbase', hopefully within less than a month.

 3) New functionality that hasn't been available in "public" R
    packages till now :

    - glmrob()  {by Andreas Ruckstuhl, based on Eva Cantoni's
		  work for S-plus (and a tiny bit of my) }

        for robust Binomial (inkl. Bernoulli/Binary) and Poisson
        GLMs, including model selection based on quasi deviance differences.

    - Qn() and Sn()  scale estimates by Rousseeuw & Croux
	   [50% breakdown but considerably more efficient than MAD]; 
	   based on their S-plus + Fortran code; ported to R by me.

    - covOGK():  The orthogonalized Gnanadesikan-Kettering
	 estimate for "fast" "high-dimensional" cov-estimation,
	 by Maronna & Zamar (2002); based on the code from Kjell Konis.
	 This includes their univariate tau-estimate, I've
	 called 'scaleTau2()' {since there's a different
	 scaleTau() in other places}. However, that tau-estimate
	 currently lacks a consistency correction factor. 
	 Hence covOGK() is not consistent (by a constant factor,
	 so still useful for correlations).
	 THIS will DEFINITELY change in the future!

    - nlrob()  for robust non-linear regression; this a
	slightly enhanced version of what has been
	available as 'rnls()' from package 'sfsmisc'. 
        Also based mainly on Andreas Ruckstuhl's work.

  4) Somewhat experimental code for an S4 class of
     "psi-function" objects.

Near-term plans for 'robustbase' include:
(within one-two months?)

 - Porting work of Matias Salibian-Barrera for
   fast high breakdown robust linear regression (via
   MM-estimators).  Based on the fastLTS and hopefully a
   fast regression S-estimator.

 - cleanup of "psi function"-class and making use of it for
   glmrob() {and lmrob() when that exists}.

 - improvements for covMcd() and ltsReg() by Valentin

 - univariate tau-estimates for scale, *including* consistency
   correction factors; here, I'll be glad for your input
   (containing code)

 - Implement and discuss ``the'' S4 class of (robust)
   multivariate-scatter+location estimates; probably
   based on input from the working group "multivariate" in Treviso
   but that needs more details, and probably would benefit from
   discussion maybe on this mailing list.

 - Related to the above: start using a  "universal"  covrob()
   function that will have a 'method' argument; 
   e.g., with method = "MCD" / "OGK" to call the ``lower level''
   covMcd() & covOGK() functions.    
   {I think the Treviso working group "multivariate" called this function
    covCenter() ??}.
   This will return an S4 object inheriting from but typically
   extending the new S4 class (of last paragraph).

For mid- and longer term, there's much more of course. 
I still hope that we can go along our original goal of receiving
all data sets and quite a bit of code that ``go along'' with the
upcoming Maronna-Martin-Yohai book.
I expect quite a bit of it will depend on what happens
exactly with the "robust" package project of Kjell Konis (and
Insightful); in particular, with what licence that will appear.

Martin Maechler, ETH Zurich

More information about the R-SIG-Robust mailing list