[R] Sum of Bernoullis with varying probabilities

Gabor Grothendieck ggrothendieck at gmail.com
Sun Oct 8 02:24:19 CEST 2006


One can get a one-line solution by taking the product of the FFTs.
For example, let p <- 1:4/8 be the probabilities.  Then the solution is:

fft(exp(rowSums(log(mvfft(t(cbind(1-p,p,0,0,0)))))), inverse = TRUE)/5

On 10/7/06, Ted Harding <Ted.Harding at nessie.mcc.ac.uk> wrote:
> Hi again.
> I had suspected that doing the calculation by a convolution
> method might be both straightforward and efficient in R.
>
> I've now located convolve() (in 'base'!!), and have a solution
> using this function. Details below. Original statement of
> problem, and method originally proposed, repeated below for
> reference.
>
> On 10/6/06, Ted Harding <Ted.Harding at nessie.mcc.ac.uk> wrote:
> > Hi Folks,
> >
> > Given a series of n independent Bernoulli trials with
> > outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want
> >
> >   P = Prob[sum(Yi) = r]  (r = 0,1,...,n)
> >
> > I can certainly find a way to do it:
> >
> > Let p be the vector c(P1,P2,...,Pn).
> > The cases r=0 and r=n are trivial (and also are exceptions
> > for the following routine).
> >
> > For a given value of r in (1:(n-1)),
> >
> >   library(combinat)
> >   Set <- (1:n)
> >   Subsets <- combn(Set,r)
> >   P <- 0
> >   for(i in (1:dim(Subsets)[2])){
> >     ix <- numeric(n)
> >     ix[Subsets[,i]] <- 1
> >     P <- P + prod((p^ix) * ((1-p)^(1-ix)))
> >   }
>
> The convolution method implemented in convolve computes, for
> two series X[1],X[2],...,X[n] and Y[1],Y[2],...,Y[m] the
> series (k = 1, 2, ... , n+m-1):
>
>  Z[k] = sum( X[k-m+i]*Y[i] )
>
> over valid values of i, i.e. products of terms of X matched
> with terms of a shifted Y, using the "open" method (see
> ?convolve).
>
> This is not quite what is wanted for the type of convolution
> needed to compute the distribution of the sum of two integer
> random variables, since one needs to reverse one of the series
> being convolved. This is the basis of the following:
>
> Given a vector p of probabilities P1, P2, P3, for "Y=1" in
> successive trials, I need to convolve the successive Bernoulli
> distributions (1-P1,P1), (1-P2,P2), ... , (1-Pn,Pn). Hence:
>
>  ps<-cbind(1-p,p); u<-ps[1,]; u1<-u
>  for(i in (2:16)){
>    u<-convolve(u,rev(ps[i,]),type="o")
>  }
>
> In the case I was looking at, I had already obtained the vector
> P of probabilities P(Sum(Y)=0), P(sum(Y)=1), ... by the method
> previously described (repeated above); and here n=16.
>
> Having obtained u by the above routine, I can now compare the
> results, u with P:
>
> cbind((0:16),u,P)
>   r             u            P
>   0  1.353353e-01 1.353353e-01
>   1  3.007903e-01 3.007903e-01
>   2  2.976007e-01 2.976007e-01
>   3  1.747074e-01 1.747074e-01
>   4  6.826971e-02 6.826971e-02
>   5  1.884969e-02 1.884969e-02
>   6  3.804371e-03 3.804371e-03
>   7  5.720398e-04 5.720398e-04
>   8  6.463945e-05 6.463945e-05
>   9  5.489454e-06 5.489454e-06
>  10  3.473997e-07 3.473997e-07
>  11  1.607822e-08 1.607822e-08
>  12  5.262533e-10 5.262532e-10
>  13  1.148626e-11 1.148618e-11
>  14  1.494650e-13 1.493761e-13
>  15  9.008700e-16 8.764298e-16
>  16 -1.896716e-17 1.313261e-19
>
> so, apart from ignorable differences in the very small
> probabilities for r>11, they agree. I have to admit, though,
> that I had to tread carefully (and experiment with Binomials)
> to make sure of exactly how to introduce the reversal
> (at "rev(ps[i,])").
>
> And the convolution method is *very* much faster than the
> selection of all subsets method!
>
> However, I wonder whether the "subsets" method may be the more
> accurate of the two (not that it really matters).
>
> Best wishes to all,
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 07-Oct-06                                       Time: 22:03:27
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list