[R] faster computation of cumulative multinomial distribution

Sat Mar 31 00:00:03 CEST 2007

Theo Borm wrote:
> 
> I have a series of /unequal/ probabilities [p1,p2,...,pk], describing
> mutually exclusive events, and a "remainder" class with a probability
> p0=1-p1-p2-....-pk, and need to calculate, for a given number of trials
> t>=k, the combined probability that each of the classes 1...k 
> contains at least 1 "event" (the remainder class may be empty).
> 
> To me this reaks of a sum of multinomial distributions, and indeed, I
> can readily calculate the correct answer for small figures t,k using 
> a small R program.
> 
The multinomial distribution is a sum of independent cathegorial
distributions, so there's no such thing as a sum of multinomial
distributions - unless the sum of multinomial distributions is
also a multinomial distribution.

Yikes. I think I am saying too much.

Just look here:
http://en.wikipedia.org/wiki/Multinomial_distribution

> However, in my typical experiment, k ranges from ~20-60 and t from
> ~40-100, and having to calculate these for about 6e9 experiments, a
> quick calculation on the back of a napkin shows me that I will not be
> able to complete these calculations before the expected end
> of the universe.
>
But what do you want to calculate? The probabilities?

> I already figured out that in this particular case I may use reciprocal
> probabilities, and if I do this I get an equation with "only" 2^k 
> terms, which would shorten my computations to a few decades.
> 
> Isn't there a faster (numerical approximation?) way to do this?
> 
I don't know what you want to do, but maybe this is the case
where a Monte Carlo Simulation would give a fast and accurate
result.

Let's say you want to estimate some function of the outcome 
of this multinomial. Just to fix ideas:

# p - the probabilities
p <- c(1, 2, 3, 2, 4, 4, 2, 1, 2, 3, 1, 2, 3, 4, 3, 2, 1, 2) 
# normalize so that sum(p) = 1
p <- p / sum(p)
# get k
k <- length(p)
# number of trials, say 100
t <- 100
# x is the simulation for each trial. x[1] is the number of
# times we got the thing with probability p[1], etc
#
# x <- rmultinom(1, k, p) # simulates 1 experiment
#
# f is the function that you will use to "evaluate" x
#
f <- function(x) {
  return(sum(x * 1.05^-(0:(length(x)-1))))  # anything can come here
}
#
# A Monte Carlo simulation will generate "a lot" of xs and
# get a distribution of f(x)
#
# Simulate 10000 xs
x <- rmultinom(10000, k, p) # takes almost zero time
# Evaluate 10000 f(x). apply(matrix, 1 (rows) or 2 (coluns), function)
f.dist <- apply(x, 2, f)
# analyse it
hist(f.dist) # etc

> R has dbinom /and/ pbinom functions, but unfortunately only a dmultinom
> and no pmultinom function... perhaps because there is no (known) 
> faster way?
> 
There's a rmultinom

Alberto Monteiro