[R] Sum of Bernoullis with varying probabilities

Gabor Grothendieck ggrothendieck at gmail.com
Sun Oct 8 08:04:55 CEST 2006


Or perhaps its clearer (and saves a bit of space) to use apply...prod
here instead of exp...log:

fft(apply(mvfft(t(cbind(1-p,p,0,0,0))), 1, prod), inverse = TRUE)/5


On 10/7/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> One can get a one-line solution by taking the product of the FFTs.
> For example, let p <- 1:4/8 be the probabilities.  Then the solution is:
>
> fft(exp(rowSums(log(mvfft(t(cbind(1-p,p,0,0,0)))))), inverse = TRUE)/5
>
> On 10/7/06, Ted Harding <Ted.Harding at nessie.mcc.ac.uk> wrote:
> > Hi again.
> > I had suspected that doing the calculation by a convolution
> > method might be both straightforward and efficient in R.
> >
> > I've now located convolve() (in 'base'!!), and have a solution
> > using this function. Details below. Original statement of
> > problem, and method originally proposed, repeated below for
> > reference.
> >
> > On 10/6/06, Ted Harding <Ted.Harding at nessie.mcc.ac.uk> wrote:
> > > Hi Folks,
> > >
> > > Given a series of n independent Bernoulli trials with
> > > outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want
> > >
> > >   P = Prob[sum(Yi) = r]  (r = 0,1,...,n)
> > >
> > > I can certainly find a way to do it:
> > >
> > > Let p be the vector c(P1,P2,...,Pn).
> > > The cases r=0 and r=n are trivial (and also are exceptions
> > > for the following routine).
> > >
> > > For a given value of r in (1:(n-1)),
> > >
> > >   library(combinat)
> > >   Set <- (1:n)
> > >   Subsets <- combn(Set,r)
> > >   P <- 0
> > >   for(i in (1:dim(Subsets)[2])){
> > >     ix <- numeric(n)
> > >     ix[Subsets[,i]] <- 1
> > >     P <- P + prod((p^ix) * ((1-p)^(1-ix)))
> > >   }
> >
> > The convolution method implemented in convolve computes, for
> > two series X[1],X[2],...,X[n] and Y[1],Y[2],...,Y[m] the
> > series (k = 1, 2, ... , n+m-1):
> >
> >  Z[k] = sum( X[k-m+i]*Y[i] )
> >
> > over valid values of i, i.e. products of terms of X matched
> > with terms of a shifted Y, using the "open" method (see
> > ?convolve).
> >
> > This is not quite what is wanted for the type of convolution
> > needed to compute the distribution of the sum of two integer
> > random variables, since one needs to reverse one of the series
> > being convolved. This is the basis of the following:
> >
> > Given a vector p of probabilities P1, P2, P3, for "Y=1" in
> > successive trials, I need to convolve the successive Bernoulli
> > distributions (1-P1,P1), (1-P2,P2), ... , (1-Pn,Pn). Hence:
> >
> >  ps<-cbind(1-p,p); u<-ps[1,]; u1<-u
> >  for(i in (2:16)){
> >    u<-convolve(u,rev(ps[i,]),type="o")
> >  }
> >
> > In the case I was looking at, I had already obtained the vector
> > P of probabilities P(Sum(Y)=0), P(sum(Y)=1), ... by the method
> > previously described (repeated above); and here n=16.
> >
> > Having obtained u by the above routine, I can now compare the
> > results, u with P:
> >
> > cbind((0:16),u,P)
> >   r             u            P
> >   0  1.353353e-01 1.353353e-01
> >   1  3.007903e-01 3.007903e-01
> >   2  2.976007e-01 2.976007e-01
> >   3  1.747074e-01 1.747074e-01
> >   4  6.826971e-02 6.826971e-02
> >   5  1.884969e-02 1.884969e-02
> >   6  3.804371e-03 3.804371e-03
> >   7  5.720398e-04 5.720398e-04
> >   8  6.463945e-05 6.463945e-05
> >   9  5.489454e-06 5.489454e-06
> >  10  3.473997e-07 3.473997e-07
> >  11  1.607822e-08 1.607822e-08
> >  12  5.262533e-10 5.262532e-10
> >  13  1.148626e-11 1.148618e-11
> >  14  1.494650e-13 1.493761e-13
> >  15  9.008700e-16 8.764298e-16
> >  16 -1.896716e-17 1.313261e-19
> >
> > so, apart from ignorable differences in the very small
> > probabilities for r>11, they agree. I have to admit, though,
> > that I had to tread carefully (and experiment with Binomials)
> > to make sure of exactly how to introduce the reversal
> > (at "rev(ps[i,])").
> >
> > And the convolution method is *very* much faster than the
> > selection of all subsets method!
> >
> > However, I wonder whether the "subsets" method may be the more
> > accurate of the two (not that it really matters).
> >
> > Best wishes to all,
> > Ted.
> >
> > --------------------------------------------------------------------
> > E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
> > Fax-to-email: +44 (0)870 094 0861
> > Date: 07-Oct-06                                       Time: 22:03:27
> > ------------------------------ XFMail ------------------------------
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>



More information about the R-help mailing list