[R] Simulation (multinomial deviate generation)

Wed Dec 6 17:46:32 CET 2000

  I've collected various implementations of multinomial deviate generation
in S/R.  There's an "rmult" package by Jonathan Rougier on STATLIB's S
archive (http://lib.stat.cmu.edu/S/rmult), which runs unmodified under R,
but is much slower than your code #1.

  The following code (which I didn't write but have tweaked -- it could be
improved further; unfortunately I don't have the author's name any more)
is much faster *if* you can generate all of your deviates at the same
time, but twice as slow if you do them one at a time.

## (rmult.gb1 and rmult.gb2 are just your code)

tot <- 100000
system.time(for (i in 1:tot) rmult.gb1(5,10))
## [1] 21.27  1.02 22.37  0.00  0.00
system.time(for (i in 1:tot) rmult.gb2(5,10))
## [1] 137.78   3.59 142.22   0.00   0.00
system.time(for (i in 1:tot) rmulti(1,rep(1/5,5),10))
## [1] 50.07  1.05 51.36  0.00  0.00
system.time(rmulti(tot,rep(1/5,5),10))  ## vectorize
## [1] 0.17 0.02 0.19 0.00 0.00

rmulti<-function (n, size, prob)
  ## pick multinomial deviates
{
  if (size==0) stop("rmulti: size==0")
  if (is.matrix(prob)) {
    l <- ncol(prob)
    totp <- prob %*% rep(1, l)
  }
  else {
    l <- length(prob)
    totp <- sum(prob)
    prob <- matrix(prob, nrow = 1)
  }
  siz <- rep(size, length.out = n)
  x <- matrix(numeric(n * l), ncol = l)
  for (i in 1:(l - 1)) {
    ## inserted min() to deal with roundoff problems that led to p>1.0
    x[, i] <- rbinom(n, siz, min(prob[, i]/totp,1.0))
    siz <- siz - x[, i]
    totp <- totp - prob[, i]
    totp[totp <= 0] <- 1
  }
  x[,l]<-siz
  x
}

On Wed, 6 Dec 2000, gb wrote:

>
> I want to draw a random sample Y_1, ..., Y_n, iid Poisson,
> but condition on their sum being equal to  k. Two
> suggestions:
>
> 1)   lambda <- k / n ## optimal value
>      ysum <- -1
>      while (ysum != k){
>         y <- rpois(n, lambda)
>         ysum <- sum(y)
>      }
>      y
>
> 2) Aiming at Multinom(k, 1/n, ..., 1/n):
>
>    y <- as.vector(table(factor(sample(n, k, replace = T), levels = 1:n)))
>
> 1) seems to be faster than 2), but my question is whether there is
> something even better?  'sample' is fast enough, but how can I do
> the rest (factor, table) smarter in 2)?
>
> Thanks for any suggestions!
>
> Göran
>

-- 
318 Carr Hall                                bolker at zoo.ufl.edu
Zoology Department, University of Florida    http://www.zoo.ufl.edu/bolker
Box 118525                                   (ph)  352-392-5697
Gainesville, FL 32611-8525                   (fax) 352-392-3704

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._