[R] Timings of function execution in R [was Re: R in Industry]
Ravi Varadhan
rvaradhan at jhmi.edu
Fri Feb 9 00:41:38 CET 2007
Hi,
"greaterOf" is indeed an interesting function. It is much faster than the
equivalent R function, "pmax", because pmax does a lot of checking for
missing data and for recycling. Tom Lumley suggested a simple function to
replace pmax, without these checks, that is analogous to greaterOf, which I
call fast.pmax.
fast.pmax <- function(x,y) {i<- x<y; x[i]<-y[i]; x}
Interestingly, greaterOf is even faster than fast.pmax, although you have to
be dealing with very large vectors (O(10^6)) to see any real difference.
> n <- 2000000
>
> x1 <- runif(n)
> x2 <- rnorm(n)
> system.time( ans1 <- greaterOf(x1,x2) )
[1] 0.17 0.06 0.23 NA NA
> system.time( ans2 <- pmax(x1,x2) )
[1] 0.72 0.19 0.94 NA NA
> system.time( ans3 <- fast.pmax(x1,x2) )
[1] 0.29 0.05 0.35 NA NA
>
> all.equal(ans1,ans2,ans3)
[1] TRUE
Ravi.
----------------------------------------------------------------------------
-------
Ravi Varadhan, Ph.D.
Assistant Professor, The Center on Aging and Health
Division of Geriatric Medicine and Gerontology
Johns Hopkins University
Ph: (410) 502-2619
Fax: (410) 614-9625
Email: rvaradhan at jhmi.edu
Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
----------------------------------------------------------------------------
--------
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Douglas Bates
Sent: Thursday, February 08, 2007 6:00 PM
To: R-Help
Subject: [R] Timings of function execution in R [was Re: R in Industry]
On 2/8/07, Albrecht, Dr. Stefan (AZ Private Equity Partner)
<stefan.albrecht at apep.com> wrote:
> Dear all,
>
> Thanks a lot for your comments.
>
> I very well agree with you that writing efficient code is about
optimisation. The most important rules I know would be:
> - vectorization
> - pre-definition of vectors, etc.
> - use matrix instead of data.frame
> - do not use named objects
> - use pure matrix instead of involved S4 (perhaps also S3) objects (can
have enormous effects)
> - use function instead of expression
> - use compiled code
> - I guess indexing with numbers (better variables) is also much faster
than with text (names) (see also above)
> - I even made, for example, my own min, max, since they are slow, e.g.,
>
> greaterOf <- function(x, y){
> # Returns for each element of x and y (numeric)
> # x or y may be a multiple of the other
> z <- x > y
> z*x + (!z)*y
That's an interesting function. I initially was tempted to respond
that you have managed to reinvent a specialized form of the ifelse
function but then I decided to do the timings just to check (always a
good idea). The enclosed timings show that your function is indeed
faster than a call to ifelse. A couple of comments:
- I needed to make the number of components in the vectors x and y
quite large before I could get reliable timings on the system I am
using.
- The recommended way of doing timings is with system.time function,
which makes an effort to minimize the effects of garbage collection on
the timings.
- Even when using system.time there is often a big difference in
timing between the first execution of a function call that generates a
large object and subsequent executions of the same function call.
[additional parts of the original message not relevant to this
discussion have been removed]
More information about the R-help
mailing list