[R] faster computation of cumulative multinomial distribution

Mon Apr 2 15:42:35 CEST 2007

Theo Borm wrote:

>>> I have a hunch that it won't work as my p vector typically contains
>>> values like:
>>
>> p<-c(0.99, 0.005, 0.003, 0.001, 0.0005, 0.0003, 0.0001, 0.00005,
>> 0.00003, 0.00002).
>>
>> 
>> So you should expect that only 1/50000 will be in the low-prob
>> hole in the average?
> 
> Or less. And normally I'll have 30-60 holes, with the "remainder" 
> hole containing > 98% of the events.
> 
I think the problem here is not with the low probability
associated to the total of the holes, but with the different
probabilities associated to the different low-probability holes.

>> Hmmm.... That's right. No way to do a Monte Carlo here!
> 
> Yes. Its back to square one for me. Maybe I should get a bigger 
> computer. I heard that IBM is selling some quite fancy stuff....
> 
> http://www.top500.org/list/2006/11/100
> 
> Would probably be quite a challenge to coerce R to run on such a 
> machine   ;-)
> 
I don't think a bigger computer will solve it, after all, you might
be dealing with probabilities close to 1:(every atom in the Universe)

Hmmm... Maybe you can break the problem into several (manageable)
parts. For example, you can divide the problem into this:

(a) given N shots, how many will hit the lowest-prob hole (the 0.00002)?

This is easy: it's the binomial distribution. There are three cases
to be considered here: those that hit zero times (and this will carry
a zero forever), those that hit exactly once, and those that hit
two or more.

Unless the two or more is not negligible compared to the exactly
once, we can treat it as two.

The next holes could be done considering the independence of
the conditional probabilities.

For example, hitting exactly once every hole would
be 
k <- length(p)
prod(dbinom(1, N - (0:(k-2)), p[k - (0:(k-2))]))

BTW, this is exactly
dmultinom(c(N-k+1, rep(1, k-1)), prob=p)

What I am missing here?

Alberto Monteiro