[RsR] estimators based on random samples... - should be random

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon May 1 18:02:07 CEST 2006


Many of the "high breakdown point" robust estimators, and
particularly those currently in the 'robustbase' package are
based on random (re)sampling, i.e., chosing random subsets of
the data, then compute "candidate estimates" for each sample,
and then going on. 

This "resampling" has also been a main ingredient for other "modern"
statistical algorithms, e.g. kmeans() clustering, and of course
for all MCMC or other simulation based methods.

In R, we have always adhered to the convention, that such
estimators should use R's random number generators (=: RNGs) and
hence their result will be a function of the initial random seed --
.Random.seed in S and R, typically set via  set.seed().

The current algorithm implmentations in 'robustbase' however do
not adhere to the convention, but rather use an own (cheap) RNG
[covMcd(), ltsReg()] or the RNG provided by the operating system
C library rand() function [lmrob()] --- and in all these cases,
always use the same random seed, by default.

Of course, this has the advantage that all your students get the
same estimates for the same data (well, at least on the same
computer hardware and software combination), but I think we
should switch to using R's RNGs and have all these results
properly depend on the current random seed, i.e. typically only
give the same results after the set.seed(<n>) call.

Package 'MASS' function lqs(), i.e. lqs.default() has an
optional argument  'seed'  with this help description:

  seed: the seed to be used for random sampling: see
       '.Random.seed'.  The current value of '.Random.seed' will
       be preserved if it is set.. 

I propose to do the same for all "sampling" functions in
'robustbase'.  This will change the meaning of 'seed' for the
rrcov.control() list used in covMcd() and ltsReg() which already
now is there, but is used for a simple builtin RNG, and does not
depend on R's .Random.seed at all.

What do you think?

Martin Maechler, ETH Zurich




More information about the R-SIG-Robust mailing list