[R] RE: rmultinom
Ravi Varadhan
rvaradha at jhsph.edu
Thu Aug 8 06:08:09 CEST 2002
Hi Mark:
I had also used sample and tabulate for generating multinomial and found it
to be quite slow. So I had written a multinomial random numbers generator
based on the GENMUL subroutine from "ranlib", which in turn is based on the
algorithm from Luc Devroye's book on "Non-Uniform Random Variate Generation"
You may want to compare this with your hybrid algorithm and see their
relative performance.
Best,
Ravi.
###############################################################
## A multinomial random vector generator
rmultinom <- function(n, p){
ncat <- length(p)
# Check Arguments
if (n < 0) {cat("n < 0 ","\n"); break}
if (ncat <= 1) {cat("ncat <= 1 ","\n"); break}
if (any(p < 0.0)) {cat("Some P(i) < 0 ","\n"); break}
if (any(p > 1.0)) {cat("Some P(i) > 1.0 ","\n"); break}
eps <- .Machine$double.eps^0.9
if (sum(p) > (1.0 + eps) | sum(p) < (1.0 - eps) ) {cat("Sum of P(i)
should equal 1.0 ","\n"); break}
# Initialize variables
ntot <- n
sum <- 1.0
ix <- rep(0,ncat)
# Generate the observation
for (icat in 1:(ncat - 1)) {
prob <- p[icat]/sum
ix[icat] <- rbinom(1,ntot,prob)
ntot <- ntot - ix[icat]
if (ntot <= 0) return(ix)
sum <- sum - p[icat]
}
ix[ncat] <- ntot
return (ix)
}
-----Original Message-----
From: Mark.Bravington at csiro.au <Mark.Bravington at csiro.au>
To: r-help at stat.math.ethz.ch <r-help at stat.math.ethz.ch>
Date: Wednesday, August 07, 2002 11:16 PM
Subject: [R] RE: rmultinom
>Dear newsgroup,
>
>There was a recent post suggesting the incorporation of a standard
>rmultinom(...). This seems like a good idea, but I wasn't sure about basing
>this on tabulate( sample( ...)). Despite the attractive succinctness, this
>could be very slow and use lots of memory if n or size is large. Instead,
>I've tended to use a loop over the boxes of the multinomial, taking
>successive binomial samples from the remaining available observations each
>time.
>
>To check, I did some timing comparisons. Unsurprisingly, the tabulate()
>version is much slower for large "size" (number of observations per box).
>However, the successive binomial version is slower when observations are
>sparse, presumably because it spends most of its time filling in zeros that
>tabulate() doesn't worry about.
>
>test> # Low dimension, many observations per box
>test> length( ranvec)
>[1] 7
>test> system.time( sucbin.rmultinom( 1000, 1000, ranvec))
>[1] 0.02 0.00 0.02 NA NA
>test> system.time( tab.rmultinom( 1000, 1000, ranvec))
>[1] 0.19 0.00 0.18 NA NA
>
>test> # High dimension, few observations per box
>test> length( ranvec.long)
>[1] 1000
>test> system.time( sucbin.rmultinom( 10000, 10, ranvec.long))
>[1] 18.42 0.06 18.54 NA NA
>test> system.time( tab.rmultinom( 10000, 10, ranvec.long))
>[1] 0.97 0.10 1.06 NA NA
>
>A good option for standard use might be a hybrid, with a "method" parameter
>saying which approach to use. Verrry roughly, this might default to "use
>successive binomials if size > length( prob)". The optimal cutoff seems to
>have a more complex dependence on n, size, and p, but this default choice
>should avoid the worst problems.
>
>As a suggestion, I've included code for a hybrid. Underscore-haters may
wish
>to stop reading here :)
>
># first, two standard functions of mine: these could go inside or be coded
>explicitly, of course
># Avoid unintended behaviour with : when 2nd arg is lower than 1st
>assign( '%upto%', function( from, to) if( from <= to) from:to else numeric(
>0))
>
># Trim tail of vector
>clip_ function( x, n=1) x[ 1 %upto% ( length( x) - n)]
>
>rmultinom_ function( n, size, prob, method=if(k>size) "tabulate" else
>"sucbin") {
> # returns a n X length( prob) matrix of integers, each row adding up to
>size
>
> k_ length( prob)
> if( pmatch( method, c( 'sucbin', 'tabulate')) == 1) { # successive
>binomials
> prob_ prob / sum( prob)
> prob[-1]_ prob[-1] / (1-clip( cumsum( prob)))
> x_ matrix( as.integer( 0), n, k)
> nn_ rep( size, n)
>
> for( i in 1 %upto% (k-1)) {
> x[,i]_ as.integer( rbinom( n, size=nn, prob=prob[ i]))
> nn_ nn - x[,i]
> }
>
> x[,k]_ nn
> } else # tabulation
> x_ matrix( tabulate( sample( k, n * size, repl = TRUE, prob) + k * 0
>%upto% (n - 1), nbins = n * k),
> nrow = n, ncol = k, byrow = TRUE) # NB using ":" instead of
"%upto%"
>would fail for n=1
>
>return( x)
>}
>
>cheers
>Mark
>
>*******************************
>
>Mark Bravington
>CSIRO (CMIS)
>PO Box 1538
>Castray Esplanade
>Hobart
>TAS 7001
>
>phone (61) 3 6232 5118
>fax (61) 3 6232 5012
>Mark.Bravington at csiro.au
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
.-.-
>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
._._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list