[R] Timings of function execution in R [was Re: R in Industry]

Fri Feb 9 16:05:07 CET 2007

On 2/9/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> The other reason why pmin/pmax are preferable to your functions is that
> they are fully generic.  It is not easy to write C code which takes into
> account that <, [, [<- and is.na are all generic.  That is not to say that
> it is not worth having faster restricted alternatives, as indeed we do
> with rep.int and seq.int.
>
> Anything that uses arithmetic is making strong assumptions about the
> inputs.  It ought to be possible to write a fast C version that worked for
> atomic vectors (logical, integer, real and character), but is there
> any evidence of profiled real problems where speed is an issue?

Yes.  I don't have the profiled timings available now and one would
need to go back to earlier versions of R to reproduce them but I did
encounter a situation where the bottleneck in a practical computation
was pmin/pmax.  The binomial and poisson families for generalized
linear models used pmin and pmax to avoid boundary conditions when
evaluating the inverse link and other functions.  When I profiled the
execution of some generalized linear model and, more importantly for
me, generalized linear mixed model fits, these calls to pmin and pmax
were the bottleneck.  That is why I moved some of the calculations for
the binomial and poisson families in the stats package to compiled
code.

In that case I didn't rewrite the general form of pmin and pmax, I
replaced specific calls in the compiled code.

>
> On Fri, 9 Feb 2007, Martin Maechler wrote:
>
> >>>>>> "Ravi" == Ravi Varadhan <rvaradhan at jhmi.edu>
> >>>>>>     on Thu, 8 Feb 2007 18:41:38 -0500 writes:
> >
> >    Ravi> Hi,
> >    Ravi> "greaterOf" is indeed an interesting function.  It is much faster than the
> >    Ravi> equivalent R function, "pmax", because pmax does a lot of checking for
> >    Ravi> missing data and for recycling.  Tom Lumley suggested a simple function to
> >    Ravi> replace pmax, without these checks, that is analogous to greaterOf, which I
> >    Ravi> call fast.pmax.
> >
> >    Ravi> fast.pmax <- function(x,y) {i<- x<y; x[i]<-y[i]; x}
> >
> >    Ravi> Interestingly, greaterOf is even faster than fast.pmax, although you have to
> >    Ravi> be dealing with very large vectors (O(10^6)) to see any real difference.
> >
> > Yes. Indeed, I have a file, first version dated from 1992
> > where I explore the "slowness" of pmin() and pmax() (in S-plus
> > 3.2 then). I had since added quite a few experiments and versions to that
> > file in the past.
> >
> > As consequence, in the robustbase CRAN package (which is only a bit
> > more than a year old though), there's a file, available as
> >  https://svn.r-project.org/R-packages/robustbase/R/Auxiliaries.R
> > with the very simple content {note line 3 !}:
> >
> > -------------------------------------------------------------------------
> > ### Fast versions of pmin() and pmax() for 2 arguments only:
> >
> > ### FIXME: should rather add these to R
> > pmin2 <- function(k,x) (x+k - abs(x-k))/2
> > pmax2 <- function(k,x) (x+k + abs(x-k))/2
> > -------------------------------------------------------------------------
> >
> > {the "funny" argument name 'k' comes from the use of these to
> > compute Huber's psi() fast :
> >
> >  psiHuber <- function(x,k)  pmin2(k, pmax2(- k, x))
> >  curve(psiHuber(x, 1.35), -3,3, asp = 1)
> > }
> >
> > One point *is* that I think proper function names would be pmin2() and
> > pmax2() since they work with exactly 2 arguments,
> > whereas IIRC the feature to work with '...' is exactly the
> > reason that pmax() and pmin() are so much slower.
> >
> > I've haven't checked if Gabor's
> >     pmax2.G <- function(x,y) {z <- x > y; z * (x-y) + y}
> > is even faster than the abs() using one.
> > It may have the advantage of giving *identical* results (to the
> > last bit!)  to pmax()  which my version does not --- IIRC the
> > only reason I did not follow my own 'FIXME' above.
> >
> > I  had then planned to implement pmin2() and pmax2() in C code, trivially,
> > and and hence get identical (to the last bit!) behavior as
> > pmin()/pmax(); but I now tend to think that the proper approach is to
> > code pmin() and pmax() via .Internal() and hence C code ...
> >
> > [Not before DSC and my vacations though!!]
> >
> > Martin Maechler, ETH Zurich
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>