[Rd] simulate in stats
Martin Maechler
maechler at stat.math.ethz.ch
Thu Sep 29 12:08:05 CEST 2005
>>>>> "PaulG" == Paul Gilbert <pgilbert at bank-banque-canada.ca>
>>>>> on Thu, 15 Sep 2005 12:07:48 -0400 writes:
PaulG> BTW, I think there is a problem with the way the
PaulG> argument "seed" is used in the new simulate in stats.
PaulG> The problem is that repeated calls to simulate using
PaulG> the default argument will introduce a new pattern
PaulG> into the RNG:
>> stats:::simulate
PaulG> function (object, nsim = 1, seed = as.integer(runif(1, 0,
PaulG> .Machine$integer.max)), ...)
PaulG> UseMethod("simulate")
PaulG> <environment: namespace:stats>
>> stats:::simulate.lm
PaulG> function (object, nsim = 1, seed = as.integer(runif(1, 0,
PaulG> .Machine$integer.max)), ...)
PaulG> {
PaulG> if (!exists(".Random.seed", envir = .GlobalEnv))
PaulG> runif(1)
PaulG> RNGstate <- .Random.seed
PaulG> set.seed(seed)
PaulG> ...
PaulG> This should not be done, as the resulting RNG has not
PaulG> been studied or proven. A better mechanism is to have
PaulG> a default argument equal NULL, and not touch the seed
PaulG> in that case.
I agree so far. I think the default seed should really be NULL
(with your semantic!) rather than a random number.
PaulG> There are several examples of this in
PaulG> the package dse1 (in bundle dse), see for example
PaulG> simulate.ARMA and simulate.SS. They also use the
PaulG> utilities in the setRNG package to save more of the
PaulG> information necessary to reproduce
PaulG> simulations. Roughly it is done like this:
PaulG> simulate.x <- function (model, rng = NULL, ...)
PaulG> {if (is.null(rng)) rng <- setRNG()
PaulG> ## returns the RNG setting to be saved with the result
PaulG> else {
PaulG> old.rng <- setRNG(rng)
PaulG> on.exit(setRNG(old.rng))
PaulG> }
PaulG> ...
as nobody has further delved into this in the mean time,
this is definitely too late for R 2.2.0, even if it was desired.
But I also think you should be able to live with interpreting
'seed' as 'rng' if you want, shouldn't you?
PaulG> The seed by itself is not very useful if the purpose
PaulG> is to be able to reproduce things, and I think it
PaulG> would be a good idea to incorporate the few small
PaulG> functions setRNG into stats (especially if the
PaulG> simulate mechanism is being introduced).
maybe we should reopen this topic {adopting ideas or even exact
implementations from your 'setRNG' into stats} in a few weeks,
when R 2.2.0 is released.
PaulG> The argument "nsim" presumably alleviates to some
PaulG> extent the above concern about changing the RNG
PaulG> pattern. However, in my fairly extensive experience
PaulG> it is not very workable to produce all the
PaulG> simulations and then do the analysis of them.
PaulG> In a Monte Carlo experiment the generated data set is just
PaulG> too big.
I believe this depends very much on the topic. The simulate()
uses that we had envisaged with simulate() don't save all the
models and then analyze them.
But maybe I'm misunderstanding your point completely here.
PaulG> A better approach is to do the analysis and save
PaulG> only necessary information after each
PaulG> simulation. That is the approach, for example, in
PaulG> dse2:::EstEval.
PaulG> Paul
PaulG> Paul Gilbert wrote:
>> Can the arguments nsim and seed be passed as part of ... in the new
>> simulate generic in R-2.2.0alpha package stats?
>> This would potentially allow me to use the stats generic rather than
>> the one I define in dse. There are contexts where nsim and seed do not
>> make sense.
Well, the current specification for simulate() has been different
explicitly.
I agree that there are situations where both 'nsim' and 'seed'
(or a generalization, say 'RNGstate') wouldn't make sense and
one still would like to use something like "simulate" in the
function name.
>> I realize that the default arguments could be ignored, but
>> it does not really make sense to introduce a new generic with that in
>> mind.
I think it would depend on the exaxt context if I would rather
use a (slightly) different function name, or just ignore the
ignorable arguments as you mention.
>> (I would also prefer that the "object" argument was called
>> "model" but this is less important.)
I'd personally agree with that; the argument was that
'object' is very generally used in such situations.
Martin Maechler
More information about the R-devel
mailing list