[R] Sum of Bernoullis with varying probabilities

Sun Oct 8 11:01:41 CEST 2006

On 08-Oct-06 Gabor Grothendieck wrote:
> Or perhaps its clearer (and saves a bit of space) to use apply...prod
> here instead of exp...log:
> 
> fft(apply(mvfft(t(cbind(1-p,p,0,0,0))), 1, prod), inverse = TRUE)/5

Yes, that's neat (in either version). With the example I have,
where length(p)=16, I applied your suggestion above in the form

  v<-fft(apply(mvfft(t(cbind(1-p,p,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))),
               1,prod), inverse = TRUE)/17

(making the number of columns for cbind equal to 17 = 16+1). Now,
comparing this result with the P I got by brute force (subsets
selection method), and removing the imaginary parts from v:

cbind(r=(0:16),v=Re(v),P)
   r            v            P
   0 1.353353e-01 1.353353e-01
   1 3.007903e-01 3.007903e-01
   2 2.976007e-01 2.976007e-01
   3 1.747074e-01 1.747074e-01
   4 6.826971e-02 6.826971e-02
   5 1.884969e-02 1.884969e-02
   6 3.804371e-03 3.804371e-03
   7 5.720398e-04 5.720398e-04
   8 6.463945e-05 6.463945e-05
   9 5.489454e-06 5.489454e-06
  10 3.473997e-07 3.473997e-07
  11 1.607822e-08 1.607822e-08
  12 5.262532e-10 5.262532e-10
  13 1.148620e-11 1.148618e-11
  14 1.494031e-13 1.493761e-13
  15 8.887779e-16 8.764298e-16
  16 1.434973e-17 1.313261e-19

so this calculation gives a better result than convolve().
The only "fly in the ointment" (which comes down to how one
sets up the arguments to cbind(), so is quite easily handled
in general application) is the variable number of columns
required according to the length of p.

Your suggestion worked nicely!
Thanks,
ted.

> 
> 
> On 10/7/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
>> One can get a one-line solution by taking the product of the FFTs.
>> For example, let p <- 1:4/8 be the probabilities.  Then the solution
>> is:
>>
>> fft(exp(rowSums(log(mvfft(t(cbind(1-p,p,0,0,0)))))), inverse = TRUE)/5
>>
>> On 10/7/06, Ted Harding <Ted.Harding at nessie.mcc.ac.uk> wrote:
>> > Hi again.
>> > I had suspected that doing the calculation by a convolution
>> > method might be both straightforward and efficient in R.
>> >
>> > I've now located convolve() (in 'base'!!), and have a solution
>> > using this function. Details below. Original statement of
>> > problem, and method originally proposed, repeated below for
>> > reference.
>> >
>> > On 10/6/06, Ted Harding <Ted.Harding at nessie.mcc.ac.uk> wrote:
>> > > Hi Folks,
>> > >
>> > > Given a series of n independent Bernoulli trials with
>> > > outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want
>> > >
>> > >   P = Prob[sum(Yi) = r]  (r = 0,1,...,n)
>> > >
>> > > I can certainly find a way to do it:
>> > >
>> > > Let p be the vector c(P1,P2,...,Pn).
>> > > The cases r=0 and r=n are trivial (and also are exceptions
>> > > for the following routine).
>> > >
>> > > For a given value of r in (1:(n-1)),
>> > >
>> > >   library(combinat)
>> > >   Set <- (1:n)
>> > >   Subsets <- combn(Set,r)
>> > >   P <- 0
>> > >   for(i in (1:dim(Subsets)[2])){
>> > >     ix <- numeric(n)
>> > >     ix[Subsets[,i]] <- 1
>> > >     P <- P + prod((p^ix) * ((1-p)^(1-ix)))
>> > >   }
>> >
>> > The convolution method implemented in convolve computes, for
>> > two series X[1],X[2],...,X[n] and Y[1],Y[2],...,Y[m] the
>> > series (k = 1, 2, ... , n+m-1):
>> >
>> >  Z[k] = sum( X[k-m+i]*Y[i] )
>> >
>> > over valid values of i, i.e. products of terms of X matched
>> > with terms of a shifted Y, using the "open" method (see
>> > ?convolve).
>> >
>> > This is not quite what is wanted for the type of convolution
>> > needed to compute the distribution of the sum of two integer
>> > random variables, since one needs to reverse one of the series
>> > being convolved. This is the basis of the following:
>> >
>> > Given a vector p of probabilities P1, P2, P3, for "Y=1" in
>> > successive trials, I need to convolve the successive Bernoulli
>> > distributions (1-P1,P1), (1-P2,P2), ... , (1-Pn,Pn). Hence:
>> >
>> >  ps<-cbind(1-p,p); u<-ps[1,]; u1<-u
>> >  for(i in (2:16)){
>> >    u<-convolve(u,rev(ps[i,]),type="o")
>> >  }
>> >
>> > In the case I was looking at, I had already obtained the vector
>> > P of probabilities P(Sum(Y)=0), P(sum(Y)=1), ... by the method
>> > previously described (repeated above); and here n=16.
>> >
>> > Having obtained u by the above routine, I can now compare the
>> > results, u with P:
>> >
>> > cbind((0:16),u,P)
>> >   r             u            P
>> >   0  1.353353e-01 1.353353e-01
>> >   1  3.007903e-01 3.007903e-01
>> >   2  2.976007e-01 2.976007e-01
>> >   3  1.747074e-01 1.747074e-01
>> >   4  6.826971e-02 6.826971e-02
>> >   5  1.884969e-02 1.884969e-02
>> >   6  3.804371e-03 3.804371e-03
>> >   7  5.720398e-04 5.720398e-04
>> >   8  6.463945e-05 6.463945e-05
>> >   9  5.489454e-06 5.489454e-06
>> >  10  3.473997e-07 3.473997e-07
>> >  11  1.607822e-08 1.607822e-08
>> >  12  5.262533e-10 5.262532e-10
>> >  13  1.148626e-11 1.148618e-11
>> >  14  1.494650e-13 1.493761e-13
>> >  15  9.008700e-16 8.764298e-16
>> >  16 -1.896716e-17 1.313261e-19
>> >
>> > so, apart from ignorable differences in the very small
>> > probabilities for r>11, they agree. I have to admit, though,
>> > that I had to tread carefully (and experiment with Binomials)
>> > to make sure of exactly how to introduce the reversal
>> > (at "rev(ps[i,])").
>> >
>> > And the convolution method is *very* much faster than the
>> > selection of all subsets method!
>> >
>> > However, I wonder whether the "subsets" method may be the more
>> > accurate of the two (not that it really matters).
>> >
>> > Best wishes to all,
>> > Ted.
>> >
>> > --------------------------------------------------------------------
>> > E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
>> > Fax-to-email: +44 (0)870 094 0861
>> > Date: 07-Oct-06                                       Time: 22:03:27
>> > ------------------------------ XFMail ------------------------------
>> >
>> > ______________________________________________
>> > R-help at stat.math.ethz.ch mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Oct-06                                       Time: 10:01:37
------------------------------ XFMail ------------------------------