[R] problem with do.call or how to speed code avoiding for() loops [SUMMARY]
Ernesto Jardim
ernesto at ipimar.pt
Fri Apr 12 18:56:29 CEST 2002
Hi
These is the summary of the discussion about do.call posted on Wed,
2002-04-10 at 13:00, by Ernesto Jardim.
The initial problem was about the use of do.call function. The purpose
was to avoid for() loops and speed up code.
Regarding do.call it was referred by Peter Dalgaard that do.call is for
"situations where the argument list of a single call needs to be
constructed from simpler components".
Also Peter said that, to loop over paralel vectors something like lapply
should be used and presented a napply function example.
Thomas Lumley raised the problem that for() loops should only be avoided
if one is using vectorised functions and explained what it means (see
message bellow).
Ernesto Jardim questioned the fact that, if the family of apply
functions are writean entirly in R, then these functions would only be
usefull for simplicity in writing code.
Several messages referred that apply is only R code. Douglas Bates said
that "S Programming" discuss the need to profile the code before
implementing changes, if one wants to make it faster. Thomas referred to
paralel processing and the increase in speed that it will bring to
apply() when it will be implemented.
Prof. Ripley answered this issue saying that apply() just streamlines a
for() loop but lapply() is faster (it makes a call to compiled code) and
its use is encouradge. Also stating that apply() is a matter of style.
All relevant messages are pasted bellow. If something is wrong in this
summary please let me know and I'll correct it.
Regards
EJ
---------------------------------------
Starting message: On Wed, 2002-04-10 at 13:00, Ernesto Jardim wrote:
Hi
I'm writing a function that uses four parameters (scalars) and I need to
run it in an iterative process (the parameters vary to find the minimum
RSS).
I don't want to use loops and so tried the do.call function. However it
didn't work. My understanding is that the do.call simple runs the
function replacing the arguments (scalars by vectors), instead of runing
the function for each set of scalars in the list, what I need.
Can you please tell me if there is another way of doing it whithout
using the for loop ?
Thanks
EJ
ps: Follows an example (off course the example doesn't make much sense
but describes the problem).
> fun
function(a,b){
vec <- rnorm(25)
res <- a*vec^b
res
}
> fun(2,3)
[1] 7.006278e+00 3.515010e-01 7.989718e+00 -3.377766e-02
-1.879471e-02
[6] -2.920680e-01 1.174834e+00 -1.088638e-03 6.448725e+00
2.591805e+00
[11] -4.313672e-04 -9.171867e-03 -6.793569e+00 -2.480562e+01
-1.514828e+01
[16] -1.259896e-01 -7.504192e-02 6.647855e-02 5.609645e-01
1.093114e-01
[21] 1.802123e+00 7.650033e-03 -3.534951e+00 -2.028473e-03
-2.837360e+01
> do.call("fun",list(a=c(1:6),b=rnorm(6)))
[1] 1.4766338 NaN 3.0214852 3.8132530 0.2753699 NaN
[7] NaN NaN 2.9998547 NaN NaN 6.3050385
[13] 0.5970596 0.8722498 2.9931344 4.0664852 NaN NaN
[19] 2.8121803 NaN 2.9989127 NaN NaN NaN
[25] 14.4631627
Warning messages:
1: longer object length
is not a multiple of shorter object length in: vec^b
2: longer object length
is not a multiple of shorter object length in: a * vec^b
>
---------------------------------------
Peter Dalgaard:
That's not what do.call does. It is for situations where the argument
list of a single call needs to be constructed from simpler components.
Your example is equivalent to fun(a=c(1:6), b=rnorm(6))
The loop over multiple parallel vectors is only doable via something
like lapply(1:6, function(i)fun(a[i],b[i]))
However, I recently played with this and got as far as this:
napply <-
function(..., FUN) {
FUN <- match.fun(FUN)
x <- list(...)
lens <- sapply(x,length)
len <- max(lens)
if (any(lens != len))
x <- lapply(x, rep, length=len)
tuples <- lapply(seq(length=len), function(i)lapply(x,"[", i))
sapply(tuples, function(t)eval(as.call(c(FUN,t))))
}
> napply(a=c(1:6),b=rnorm(6), FUN=fun)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1.0259135 NaN 3.003882 NaN NaN 20.299212
[2,] NaN 1.977696 3.026111 NaN 3.951746 19.107481
[3,] 1.1840499 2.024837 NaN 8.289768 NaN 7.479917
[4,] 0.9756922 2.003576 NaN NaN 4.236000 NaN
[5,] 1.0010550 2.006045 NaN NaN NaN 1302.330425
[6,] NaN NaN NaN 2.472650 NaN NaN
[7,] NaN 2.094956 NaN NaN NaN 3.685879
[8,] 0.8646628 NaN 2.993435 NaN 3.369501 NaN
[9,] NaN 2.044915 3.006433 6.426090 6.123980 19.235790
[10,] 1.6051736 NaN 3.011986 NaN 3.638641 NaN
....
> > fun
> function(a,b){
>
> vec <- rnorm(25)
> res <- a*vec^b
> res
>
> }
> > fun(2,3)
> [1] 7.006278e+00 3.515010e-01 7.989718e+00 -3.377766e-02
> -1.879471e-02
> [6] -2.920680e-01 1.174834e+00 -1.088638e-03 6.448725e+00
> 2.591805e+00
> [11] -4.313672e-04 -9.171867e-03 -6.793569e+00 -2.480562e+01
> -1.514828e+01
> [16] -1.259896e-01 -7.504192e-02 6.647855e-02 5.609645e-01
> 1.093114e-01
> [21] 1.802123e+00 7.650033e-03 -3.534951e+00 -2.028473e-03
> -2.837360e+01
> > do.call("fun",list(a=c(1:6),b=rnorm(6)))
> [1] 1.4766338 NaN 3.0214852 3.8132530 0.2753699 NaN
> [7] NaN NaN 2.9998547 NaN NaN 6.3050385
> [13] 0.5970596 0.8722498 2.9931344 4.0664852 NaN NaN
> [19] 2.8121803 NaN 2.9989127 NaN NaN NaN
> [25] 14.4631627
> Warning messages:
> 1: longer object length
> is not a multiple of shorter object length in: vec^b
> 2: longer object length
> is not a multiple of shorter object length in: a * vec^b
> >
---------------------------------------
Thomas Lumley:
And why wouldn't you want to use the for() loop? Unless your function
is vectorised you're not going to gain anything by getting rid of the
for() loop.
Definition of vectorised function by Thomas:
Many R functions can operate on a vector of parameter values, eg
log(10,c(2,e,10)) gives the log of 10 to base 2, e, and 10
If your function can do this, you can construct a set of vectors
containing all your parameter values (expand.grid() is useful for this)
and evaluate your function once.
This can be faster than for() loops when much of the iteration is done
in compiled code. If the iteration has to be done in interpreted code
then you can't really speed up the for() loops. You can hide the loops
with the apply() functions, which may make your code more readable, but
it won't typically speed it up.
---------------------------------------
Ernesto Jardim:
This was not my understanding. I thougth that if you can use functions
like apply and similar instead of for loops your code will be faster.
Basicly relying on these functions code which is (should be) optimized
for speed.
If what you're saying is true then using functions like apply is a
matter of simplicity and not speeding up the code.
Is this correct ?
---------------------------------------
Douglas Bates:
Yes.
If you examine the apply function you will see that the bulk of the
work is done in a loop
if (length(d.call) < 2) {
if (length(dn.call))
dimnames(newX) <- c(dn.call, list(NULL))
for (i in 1:d2) ans[[i]] <- FUN(newX[, i], ...)
}
else for (i in 1:d2) ans[[i]] <- FUN(array(newX[, i], d.call,
dn.call), ...)
In their book "S Programming" (Springer, 2000) Venables and Ripley
discuss general strategies for writing R functions and for making them
faster. One general principle is to profile the code before
implementing changes. The manual "Writing R Extensions" has a section
on "Profiling R code" which is highly recommended.
---------------------------------------
Thomas Lumley:
Yes. As you can easily verify [and always should verify if you're doing
optimisation], the apply commands are rarely faster than their for()
loop equivalents. They can be slower.
The speed advantage of apply is partly mythical -- there's never been
that much advantage -- and partly historical, as in some versions of
S-PLUS 3.x apply was often faster for complicated reasons due to memory
management.
The real point of the apply() family is to suppress unnecessary loop
variables and make your code tidier. If we ever get parallel processing
then apply() could really become faster, but that's not going to happen
any time soon.
---------------------------------------
Brian Ripley:
I think that is a little pessimistic. It is true for apply() in R, which
just streamlines a for() loop, and also does things you may not want.
However, lapply is an internal function (written by me) because it is
sometimes a lot faster, and in my experiments never slower.
lapply() was a lot faster in S-PLUS 3.4. It was often slower than for()
in 5.0, hence a lot of consternation. There *are* a lot of myths
about,
but not all in one direction. As others have said, `S Programming'
tries
to give a balanced view across 3 different S implementations, and
profiling can be a great tool in optimizing code (it can be misleading
too, but rarely when it matters).
Summary: lapply is encouraged. apply is a matter of style. Test out
whatever you do to see if it is really worthwhile.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list