[RsR] estimators based on random samples... - should be random

Mon May 8 13:50:14 CEST 2006

Hello,

I agree on the convention of R, i.e. to use the R random number generators 
and the set.seed() function to set the seed. Do you have 
any suggestions on how to implement/handling these functions from C and 
fortran; is there examples or documentation to take a look? (At least, 
for me, this is one of the reason I still use generators from outside R)

As Matias point out, at the end of this email, methods may have more than 
one solution. In wle package I implement some plots to contrast different 
solutions. An example is in wle.lm (example(wle.lm)).

Claudio

On Mon, 1 May 2006, Matias Salibian-Barrera wrote:

>
> Hello,
>
> Thanks Martin for (once again!) taking the lead in sparking a
> discussion. My comments are inserted below.
>
>> In R, we have always adhered to the convention, that such
>> estimators should use R's random number generators (=: RNGs) and
>> hence their result will be a function of the initial random seed --
>> .Random.seed in S and R, typically set via  set.seed().
>
> A good convention, IMHO.
>
>> The current algorithm implmentations in 'robustbase' however do
>> not adhere to the convention, but rather use an own (cheap) RNG
>> [covMcd(), ltsReg()] or the RNG provided by the operating system
>> C library rand() function [lmrob()] --- and in all these cases,
>> always use the same random seed, by default.
>
> I believe this (each algorithm using its own or the operating system's
> RNG) is merely due to the "atomized" nature of the development of the
> separate pieces of code that are now in robustbase, and does not reflect
> an "a priori design criteria".
>
>> Of course, this has the advantage that all your students get the
>> same estimates for the same data (well, at least on the same
>> computer hardware and software combination), but I think we
>> should switch to using R's RNGs and have all these results
>> properly depend on the current random seed, i.e. typically only
>> give the same results after the set.seed(<n>) call.
>
> Probably the most noticeable effect of this change would be that in some
> cases consecutive calls to fit the same model on the same data may yield
> different results, and high levels of anxiety on the "uninitiated" user
> will surely follow...
>
> I guess if the convergence criteria of these algorithms is sufficiently
> tight then this will typically happen only on those cases where the
> existence of two (or more) solutions is actually informative (and
> probably relevant for the analysis). Maybe somebody has had other
> experiences?
>
> I second Martin's suggestion, but add that we accompany this change with
> good examples (one for each model?) on the documentation illustrating
> how different solutions can yield more insight on the analysis.
>
> Matias
>
>
> --
> ______________________________________________________________
> Matias Salibian-Barrera - Department of Statistics
> University of British Columbia - matias using stat.ubc.ca
> Phone: (604) 822-3410 - Fax: (604) 822-6960
>
> _______________________________________________
> R-SIG-Robust using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
>

--------------------------------------------------------------
Claudio Agostinelli
Dipartimento di Statistica
Universita' Ca' Foscari di Venezia
San Giobbe, Cannaregio 873
30121 Venezia
Tel: 041 2347446, Fax: 041 2347444
email: claudio using unive.it, www: www.dst.unive.it/~claudio
--------------------------------------------------------------
Per favore non mandatemi allegati in Word o PowerPoint.
Si veda http://www.gnu.org/philosophy/no-word-attachments.html

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html