[RsR] estimators based on random samples... - should be random

Matias Salibian-Barrera m@t|@@ @end|ng |rom @t@t@ubc@c@
Mon May 1 19:19:25 CEST 2006


Hello,

Thanks Martin for (once again!) taking the lead in sparking a 
discussion. My comments are inserted below.

> In R, we have always adhered to the convention, that such
> estimators should use R's random number generators (=: RNGs) and
> hence their result will be a function of the initial random seed --
> .Random.seed in S and R, typically set via  set.seed().

A good convention, IMHO.

> The current algorithm implmentations in 'robustbase' however do
> not adhere to the convention, but rather use an own (cheap) RNG
> [covMcd(), ltsReg()] or the RNG provided by the operating system
> C library rand() function [lmrob()] --- and in all these cases,
> always use the same random seed, by default.

I believe this (each algorithm using its own or the operating system's 
RNG) is merely due to the "atomized" nature of the development of the 
separate pieces of code that are now in robustbase, and does not reflect 
an "a priori design criteria".

> Of course, this has the advantage that all your students get the
> same estimates for the same data (well, at least on the same
> computer hardware and software combination), but I think we
> should switch to using R's RNGs and have all these results
> properly depend on the current random seed, i.e. typically only
> give the same results after the set.seed(<n>) call.

Probably the most noticeable effect of this change would be that in some 
cases consecutive calls to fit the same model on the same data may yield 
different results, and high levels of anxiety on the "uninitiated" user 
will surely follow...

I guess if the convergence criteria of these algorithms is sufficiently 
tight then this will typically happen only on those cases where the 
existence of two (or more) solutions is actually informative (and 
probably relevant for the analysis). Maybe somebody has had other 
experiences?

I second Martin's suggestion, but add that we accompany this change with 
good examples (one for each model?) on the documentation illustrating 
how different solutions can yield more insight on the analysis.

Matias


--
______________________________________________________________
Matias Salibian-Barrera - Department of Statistics
University of British Columbia - matias using stat.ubc.ca
Phone: (604) 822-3410 - Fax: (604) 822-6960




More information about the R-SIG-Robust mailing list