[R] RE: rmultinom

Thu Aug 8 06:08:09 CEST 2002

Hi Mark:

I had also used sample and tabulate for generating multinomial and found it
to be quite slow.  So I had written a multinomial random numbers generator
based on the GENMUL subroutine from "ranlib", which in turn is based on the
algorithm from Luc Devroye's book on "Non-Uniform Random Variate Generation"
You may want to compare this with your hybrid algorithm and see their
relative performance.

Best,
Ravi.

###############################################################
##      A multinomial random vector generator
rmultinom <- function(n, p){
 ncat <- length(p)
#     Check Arguments
      if (n < 0) {cat("n < 0 ","\n"); break}
      if (ncat <= 1) {cat("ncat <= 1 ","\n"); break}
      if (any(p < 0.0)) {cat("Some P(i) < 0 ","\n"); break}
      if (any(p > 1.0)) {cat("Some P(i) > 1.0 ","\n"); break}
 eps <- .Machine$double.eps^0.9
      if (sum(p) > (1.0 + eps) | sum(p) < (1.0 - eps) ) {cat("Sum of P(i)
should equal 1.0 ","\n"); break}

#     Initialize variables
      ntot <- n
      sum <- 1.0
      ix <- rep(0,ncat)

#     Generate the observation
      for (icat in 1:(ncat - 1)) {
          prob <-  p[icat]/sum
          ix[icat] <-  rbinom(1,ntot,prob)
          ntot <- ntot - ix[icat]
          if (ntot <= 0) return(ix)
          sum <- sum - p[icat]
 }
      ix[ncat] <- ntot
 return (ix)
}

-----Original Message-----
From: Mark.Bravington at csiro.au <Mark.Bravington at csiro.au>
To: r-help at stat.math.ethz.ch <r-help at stat.math.ethz.ch>
Date: Wednesday, August 07, 2002 11:16 PM
Subject: [R] RE: rmultinom

>Dear newsgroup,
>
>There was a recent post suggesting the incorporation of a standard
>rmultinom(...). This seems like a good idea, but I wasn't sure about basing
>this on tabulate( sample( ...)). Despite the attractive succinctness, this
>could be very slow and use lots of memory if n or size is large. Instead,
>I've tended to use a loop over the boxes of the multinomial, taking
>successive binomial samples from the remaining available observations each
>time.
>
>To check, I did some timing comparisons. Unsurprisingly, the tabulate()
>version is much slower for large "size" (number of observations per box).
>However, the successive binomial version is slower when observations are
>sparse, presumably because it spends most of its time filling in zeros that
>tabulate() doesn't worry about.
>
>test> # Low dimension, many observations per box
>test> length( ranvec)
>[1] 7
>test> system.time( sucbin.rmultinom( 1000, 1000, ranvec))
>[1] 0.02 0.00 0.02   NA   NA
>test> system.time( tab.rmultinom( 1000, 1000, ranvec))
>[1] 0.19 0.00 0.18   NA   NA
>
>test> # High dimension, few observations per box
>test> length( ranvec.long)
>[1] 1000
>test> system.time( sucbin.rmultinom( 10000, 10, ranvec.long))
>[1] 18.42  0.06 18.54    NA    NA
>test> system.time( tab.rmultinom( 10000, 10, ranvec.long))
>[1] 0.97 0.10 1.06   NA   NA
>
>A good option for standard use might be a hybrid, with a "method" parameter
>saying which approach to use. Verrry roughly, this might default to "use
>successive binomials if size > length( prob)". The optimal cutoff seems to
>have a more complex dependence on n, size, and p, but this default choice
>should avoid the worst problems.
>
>As a suggestion, I've included code for a hybrid. Underscore-haters may
wish
>to stop reading here :)
>
># first, two standard functions of mine: these could go inside or be coded
>explicitly, of course
># Avoid unintended behaviour with : when 2nd arg is lower than 1st
>assign( '%upto%', function( from, to) if( from <= to) from:to else numeric(
>0))
>
># Trim tail of vector
>clip_ function( x, n=1) x[ 1 %upto% ( length( x) - n)]
>
>rmultinom_ function( n, size, prob, method=if(k>size) "tabulate" else
>"sucbin") {
>  # returns a n X length( prob) matrix of integers, each row adding up to
>size
>
>  k_ length( prob)
>  if( pmatch( method, c( 'sucbin', 'tabulate')) == 1) { # successive
>binomials
>    prob_ prob / sum( prob)
>    prob[-1]_ prob[-1] / (1-clip( cumsum( prob)))
>    x_ matrix( as.integer( 0), n, k)
>    nn_ rep( size, n)
>
>    for( i in 1 %upto% (k-1)) {
>      x[,i]_ as.integer( rbinom( n, size=nn, prob=prob[ i]))
>      nn_ nn - x[,i]
>    }
>
>    x[,k]_ nn
>  } else # tabulation
>    x_ matrix( tabulate( sample( k, n * size, repl = TRUE, prob) + k * 0
>%upto% (n - 1), nbins = n * k),
>        nrow = n, ncol = k, byrow = TRUE) # NB using ":" instead of
"%upto%"
>would fail for n=1
>
>return( x)
>}
>
>cheers
>Mark
>
>*******************************
>
>Mark Bravington
>CSIRO (CMIS)
>PO Box 1538
>Castray Esplanade
>Hobart
>TAS 7001
>
>phone (61) 3 6232 5118
>fax (61) 3 6232 5012
>Mark.Bravington at csiro.au
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
.-.-
>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
._._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._