[Rd] Rcpp too good to be true?

Martin Morgan mtmorgan at fhcrc.org
Wed Dec 14 01:52:48 CET 2011


On 12/13/2011 03:48 PM, Jeffrey Pollock wrote:
> Hello all,
>
> I've been working on a package to do various things related to the
> Conway-Maxwell-Poisson distribution and wanted to be able to make fast
> random draws from the distribution. My R code was running quite slow so I
> decided to give Rcpp a bash. I had used it before but only for extremely
> basic stuff and always using inline. This time I decided to give making a
> proper package a go.
>
> First of all I should say that this was incredibly easy due to
> Rcpp.package.skeleton() and the countless answers to quesions online and
> documentation!
>
> Secondly, I'm worried that my speedup has been so massive (over 500x !!!)
> that I think I've made a mistake, hence my post here.
>
> Here is all my code, if someone has a minute to point out anything wrong
> (or even if its correct and there is room for improvement, im pretty new to
> this) it would be much appreciated. I've had a simple look at the results
> and they look fine, but seriously, 500x faster?!
>
> function in R;
> library(compiler)
>
> Rrcomp<- cmpfun(
>          function(n, lam, nu, max = 100L) {
>              ans<- integer(n)
>              dist<- dcomp(0:max, lam, nu, max)
>              u<- runif(n)
>              for (i in 1:n) {
>                  count<- 0L
>                  pr<- dist[1L]
>                  while (pr<  u[i]) {
>                      count<- count + 1L
>                      pr<- pr + dist[count + 1L]
>                  }
>                  ans[i]<- count
>              }
>              return(ans)
>          }
> )

Hi Jeff

Not really what you're asking about, but looks like you're sampling with 
replacement from the sequence 0:(dist-1) n times with probability dist, so

Rrcomp.1 <-
     function(n, lam, nu, max = 100L)
{
     dist <- dcomp(0:max, lam, nu, max)
     sample(seq_along(dist) - 1L, n, TRUE, prob=dist)
}

and

 > system.time(res <- table(Rrcomp(n, lam, nu))); res
    user  system elapsed
   1.493   0.000   1.495

    0    1    2    3    4    5    6    7    8    9   10   11   12   13
  355 1656 4070 6976 8745 8861 7275 5214 3357 1892  926  399  165   69
   14   15   16   17
   24   11    2    3
 > system.time(res <- table(Rrcomp.1(n, lam, nu))); res
    user  system elapsed
   0.029   0.000   0.028

    0    1    2    3    4    5    6    7    8    9   10   11   12   13
  333 1754 4096 6876 8964 8799 7399 5030 3215 1877  951  432  184   61
   14   15   16
   23    5    1

Martin

> dcomp<- function(y, lam, nu, max = 100L) {
>      Z<- function(lam, nu, max) {
>          sum<- 0L
>          for(i in 0L:max) {
>              sum<- sum + lam^i / factorial(i)^nu
>          }
>          return(sum)
>      }
>      return(lam^y / (factorial(y)^nu * Z(lam, nu, max)))
> }
>
> function in Rcpp;
> header file;
>
> #include<Rcpp.h>
>
> RcppExport SEXP rcomp(SEXP n_, SEXP dist_);
>
> source file;
>
> #include "rcomp.h"
>
> SEXP rcomp(SEXP n_, SEXP dist_) {
>      using namespace Rcpp ;
>
>      int n = as<int>(n_);
>      NumericVector dist(dist_);
>
>      NumericVector ans(n);
>      int count;
>      double pr;
>      RNGScope scope;
>      NumericVector u = runif(n);
>
>      for (int i = 0; i<  n; ++i) {
>          count = 0;
>          pr = dist[0];
>          while (pr<  u[i]) {
>              count++;
>              pr += dist[count];
>          }
>          ans[i] = count;
>      }
>      return ans;
> }
>
> R call;
>
> rcomp<- function(n, lam, nu, max = 100){
>      dist<- dcomp(0:max, lam, nu, max)
>      .Call("rcomp", n = n, dist = dist, PACKAGE = "glmcomp")
> }
>
> Here are some results;
>> n<- 50000
>> lam<- 5
>> nu<- 1
>> rbind(table(rcomp(n, lam, nu))[1:10] / n, table(Rrcomp(n, lam, nu))[1:10]
> / n, dpois(0:9, lam))
>                 0          1          2         3         4
> 5         6
> [1,] 0.006440000 0.03124000 0.08452000 0.1392200 0.1747800 0.1755200
> 0.1490000
> [2,] 0.006660000 0.03232000 0.08412000 0.1425400 0.1779600 0.1748400
> 0.1445600
> [3,] 0.006737947 0.03368973 0.08422434 0.1403739 0.1754674 0.1754674
> 0.1462228
>               7          8          9
> [1,] 0.1063000 0.06538000 0.03534000
> [2,] 0.1039800 0.06492000 0.03624000
> [3,] 0.1044449 0.06527804 0.03626558
>
> (for nu = 1 the com-poisson distribution is the same as normal poisson)
>
>> benchmark(rcomp(n, lam, nu), Rrcomp(n, lam, nu), replications = 10, order
> = "relative")
>                  test replications elapsed relative user.self sys.self
> 2 Rrcomp(n, lam, nu)           10    2.03   1.0000      1.96     0.00
> 1  rcomp(n, lam, nu)           10 1172.51 577.5911   1164.50     0.08
>
> Thanks in advance if anyone has any time to have a look at this :)
>
> Jeff
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the R-devel mailing list