[R] Timings of function execution in R [was Re: R in Industry]
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Feb 9 15:50:53 CET 2007
The other reason why pmin/pmax are preferable to your functions is that
they are fully generic. It is not easy to write C code which takes into
account that <, [, [<- and is.na are all generic. That is not to say that
it is not worth having faster restricted alternatives, as indeed we do
with rep.int and seq.int.
Anything that uses arithmetic is making strong assumptions about the
inputs. It ought to be possible to write a fast C version that worked for
atomic vectors (logical, integer, real and character), but is there
any evidence of profiled real problems where speed is an issue?
On Fri, 9 Feb 2007, Martin Maechler wrote:
>>>>>> "Ravi" == Ravi Varadhan <rvaradhan at jhmi.edu>
>>>>>> on Thu, 8 Feb 2007 18:41:38 -0500 writes:
>
> Ravi> Hi,
> Ravi> "greaterOf" is indeed an interesting function. It is much faster than the
> Ravi> equivalent R function, "pmax", because pmax does a lot of checking for
> Ravi> missing data and for recycling. Tom Lumley suggested a simple function to
> Ravi> replace pmax, without these checks, that is analogous to greaterOf, which I
> Ravi> call fast.pmax.
>
> Ravi> fast.pmax <- function(x,y) {i<- x<y; x[i]<-y[i]; x}
>
> Ravi> Interestingly, greaterOf is even faster than fast.pmax, although you have to
> Ravi> be dealing with very large vectors (O(10^6)) to see any real difference.
>
> Yes. Indeed, I have a file, first version dated from 1992
> where I explore the "slowness" of pmin() and pmax() (in S-plus
> 3.2 then). I had since added quite a few experiments and versions to that
> file in the past.
>
> As consequence, in the robustbase CRAN package (which is only a bit
> more than a year old though), there's a file, available as
> https://svn.r-project.org/R-packages/robustbase/R/Auxiliaries.R
> with the very simple content {note line 3 !}:
>
> -------------------------------------------------------------------------
> ### Fast versions of pmin() and pmax() for 2 arguments only:
>
> ### FIXME: should rather add these to R
> pmin2 <- function(k,x) (x+k - abs(x-k))/2
> pmax2 <- function(k,x) (x+k + abs(x-k))/2
> -------------------------------------------------------------------------
>
> {the "funny" argument name 'k' comes from the use of these to
> compute Huber's psi() fast :
>
> psiHuber <- function(x,k) pmin2(k, pmax2(- k, x))
> curve(psiHuber(x, 1.35), -3,3, asp = 1)
> }
>
> One point *is* that I think proper function names would be pmin2() and
> pmax2() since they work with exactly 2 arguments,
> whereas IIRC the feature to work with '...' is exactly the
> reason that pmax() and pmin() are so much slower.
>
> I've haven't checked if Gabor's
> pmax2.G <- function(x,y) {z <- x > y; z * (x-y) + y}
> is even faster than the abs() using one.
> It may have the advantage of giving *identical* results (to the
> last bit!) to pmax() which my version does not --- IIRC the
> only reason I did not follow my own 'FIXME' above.
>
> I had then planned to implement pmin2() and pmax2() in C code, trivially,
> and and hence get identical (to the last bit!) behavior as
> pmin()/pmax(); but I now tend to think that the proper approach is to
> code pmin() and pmax() via .Internal() and hence C code ...
>
> [Not before DSC and my vacations though!!]
>
> Martin Maechler, ETH Zurich
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list