[Rd] efficiency of sample() with prob.

Bo Peng ben.bob at gmail.com
Fri Jun 24 17:32:45 CEST 2005


On 6/24/05, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> `Research' involves looking at all the competitor methods, devising a
> near-optimal strategy and selecting amongst methods according to that
> strategy.  It is not a quick fix we are looking for but something that
> will be good for the long term.
> 

I am sorry but I am afraid that I do not have enough time and
background knowledge
to do a thorough research in this area. I have tried bisection search
method and the alias
method, the latter has greatly improved the performance of my bioinformatics
application. Since this method is the only one mentioned in Knuth's
book, I have no
idea about other alternatives. 

Attached is a slightly improved version of the alias method. It may be
helpful to people
having similar problems.

Thanks.

--
Bo Peng
Department of Statistics
Rice University.
http://bp6.stat.rice.edu:8080/

/* replace probSampleReplace in src/main/random.c */
static void ProbSampleReplace(int n, double *p, int *perm, int nans, int *ans)
{
    /* allocate memory for a, p, H and L is the second half of a  */
    double * q = Calloc(n, double);
    int * a = Calloc(2*n, int);
    int * LL = a+n, * HH = a+2*n-1;  /* start of H and L vector */
    int * L = LL, * H = HH;          /* end of H and L vector */
    int i, j, k;
    double rU;  /* U[0,1)*n */

    /* set up alias table */
    /* initialize q with n*p0,...n*p_n-1 */
    for(i=0; i<n; ++i)
        q[i] = p[i]*n;

    /* initialize a with indices */
    for(i=0; i<n; ++i)
        a[i] = i;

    /* set up H and L */
    for(i=0; i<n; ++i) {
        if( q[i] >= 1.)
            *H-- = i;
        else
            *L++ = i;
    }

    while( L != LL && H != HH ) {
        j = *(L-1);
        k = *(H+1);
        a[j] = k;
        q[k] += q[j] - 1;
        L--;                                  /* remove j from L */
        if( q[k] < 1. ) {
            *L++ = k;                         /* add k to L */
            ++H;                              /* remove k */
        }
    }

    /* generate sample */
    for (i = 0; i < nans; ++i) {
	rU = unif_rand() * n;

        k = (int)(rU);
        rU -= k;  /* rU becomes rU-[rU] */

        if( rU < q[k] )
            ans[i] = k+1;
        else
            ans[i] = a[k]+1;
    }
    Free(a);
    Free(q); 
}



More information about the R-devel mailing list