[RsR] Package 'robustbase' available from CRAN
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Fri Feb 10 14:14:14 CET 2006
Dear "robust R users",
Yesterday, I have uploaded the first public version (0.1-2) of
the 'robustbase' package to CRAN. It has appeared on the CRAN
master (in Vienna) and will propagate to the mirror sites within
a few days at most. For Windows users, the *binary* version of
the package will also hopefully become available till after the
weekend.
Since I'll be vacationing for a bit more than a week (and then
be busy with other duties), I've wanted to make this available
in time, for your feedback and usage.
Currently, packageDescription("robustbase") contains
NOTE: SEVERAL PARTS ARE STILL PRELIMINARY AND MAY BE
CHANGED IN THE FUTURE. THIS TYPICALLY INCLUDES
ARGUMENT NAMES, DEFAULTS FOR ARGUMENTS AND RETURN
VALUES.
for good reasons. This is also remarked on some of the help
pages (but not on all where it should be!)
Content:
--------
1) Many data sets, particularly from the book of Rousseeuw &
Leroy, mostly thanks to Valentin Todorov; all of them with
nice help pages :
> data(package = "robustbase")
Data sets in package $,1rx(Brobustbase$,1ry(B:
Animals2 Brain and Body Weights for 65 Species of
Land Animals
SiegelsEx Siegel's Exact Fit Example Data
aircraft Aircraft Data
airmay Air Quality Data
bushfire Campbell Bushfire Data
carrots Insect Damages on Carrots
cloud Cloud point of a Liquid
coleman Coleman Data Set
delivery Delivery Time Data
education Education Expenditure Data
epilepsy Epilepsy Attacks Data Set
exAM Example Data of Antille and May - for Simple
Regression
hbk Hawkins, Bradu, Kass's Artificial Data
heart Heart Catherization Data
lactic Lactic Acid Concentration Measurement Data
milk Daudin's Milk Composition Data
pension Pension Funds Data
phosphor Phosphorus Content Data
pilot Pilot-Plant Data
salinity Salinity Data
starsCYG Hertzsprung-Russell Diagram Data of Star
Cluster CYG OB1
telef Number of International Calls from Belgium
vaso Vaso Constriction Data Set
wood Modified Data on Wood Specific Gravity
I strongly recommend that you start using these datasets in
your packages and scripts even if you don't use 'robustbase'
there yet.
To use a dataset from another package, you don't need to
attach nor load that package; it's sufficient to say, e.g.,
data(wood, package = "robustbase")
2) covMcd() and covLts() as they were in a version of
Valentin's "rrcov" package several weeks ago; these are
slightly older than in the very newest version of 'rrcov'.
However, that will probably change with next release of
'robustbase', hopefully within less than a month.
3) New functionality that hasn't been available in "public" R
packages till now :
- glmrob() {by Andreas Ruckstuhl, based on Eva Cantoni's
work for S-plus (and a tiny bit of my) }
for robust Binomial (inkl. Bernoulli/Binary) and Poisson
GLMs, including model selection based on quasi deviance differences.
- Qn() and Sn() scale estimates by Rousseeuw & Croux
[50% breakdown but considerably more efficient than MAD];
based on their S-plus + Fortran code; ported to R by me.
- covOGK(): The orthogonalized Gnanadesikan-Kettering
estimate for "fast" "high-dimensional" cov-estimation,
by Maronna & Zamar (2002); based on the code from Kjell Konis.
This includes their univariate tau-estimate, I've
called 'scaleTau2()' {since there's a different
scaleTau() in other places}. However, that tau-estimate
currently lacks a consistency correction factor.
Hence covOGK() is not consistent (by a constant factor,
so still useful for correlations).
THIS will DEFINITELY change in the future!
- nlrob() for robust non-linear regression; this a
slightly enhanced version of what has been
available as 'rnls()' from package 'sfsmisc'.
Also based mainly on Andreas Ruckstuhl's work.
4) Somewhat experimental code for an S4 class of
"psi-function" objects.
Near-term plans for 'robustbase' include:
---------------
(within one-two months?)
- Porting work of Matias Salibian-Barrera for
fast high breakdown robust linear regression (via
MM-estimators). Based on the fastLTS and hopefully a
fast regression S-estimator.
- cleanup of "psi function"-class and making use of it for
glmrob() {and lmrob() when that exists}.
- improvements for covMcd() and ltsReg() by Valentin
- univariate tau-estimates for scale, *including* consistency
correction factors; here, I'll be glad for your input
(containing code)
- Implement and discuss ``the'' S4 class of (robust)
multivariate-scatter+location estimates; probably
based on input from the working group "multivariate" in Treviso
(http://www.econ.kuleuven.be/public/NDBAE49/R/Rmultivariate.pdf);
but that needs more details, and probably would benefit from
discussion maybe on this mailing list.
- Related to the above: start using a "universal" covrob()
function that will have a 'method' argument;
e.g., with method = "MCD" / "OGK" to call the ``lower level''
covMcd() & covOGK() functions.
{I think the Treviso working group "multivariate" called this function
covCenter() ??}.
This will return an S4 object inheriting from but typically
extending the new S4 class (of last paragraph).
For mid- and longer term, there's much more of course.
I still hope that we can go along our original goal of receiving
all data sets and quite a bit of code that ``go along'' with the
upcoming Maronna-Martin-Yohai book.
I expect quite a bit of it will depend on what happens
exactly with the "robust" package project of Kjell Konis (and
Insightful); in particular, with what licence that will appear.
Martin Maechler, ETH Zurich
More information about the R-SIG-Robust
mailing list