[R] looping over lapply calls

Prof Brian D Ripley ripley at stats.ox.ac.uk
Sun Jun 4 08:03:34 CEST 2000

On Sat, 3 Jun 2000, Ramon Diaz-Uriarte wrote:

> Dear All,
> I am writing a function to analyze simulated data. For each subset of data
> (those with the same simulation counter), I have to fit a linear model and
> output the coefficients and the F's from drop1 (marginal F tests). I have
> tried three approaches: using lapply, using a for loop, and looping over
> blocks, where within each block I use lapply (following the suggestion in "S
> programming", pp. 156 and 174). The later is often the fastest method
> (execution time can be less than half of the other methods).  I am wondering: 
> a) why exactly is that the case? (Is it related to the "split" in lapply or
> the "matrix(unlist(etc))" in my function)
> b) is there some rule of thumb to choose the size of the block over
> which to use lapply?

I'm afraid this all depends on the exact S engine in use and the amount of
memory available.  However,

1) The differences are usually less in R than in S-PLUS, as the examples
in `S Programming' suggest.  Mainly because R does a better job with the
naive approaches, but also because it does a worse job on the
most elegant approaches in S-PLUS (which may differ by version).

2) I suspect such comparisons will change when R changes its memory
management (minor changes in 1.1, major ones in 1.2?), as they have over
successive versions of S-PLUS. They certainly changed when I wrote an
internal lapply in R, but I have not put live my attempts at an internal
apply as it seems to make too little difference to measure accurately.

3) In S engines the reasons are related to minimizing the number of pieces
of memory in use as well as the size.  Using blocks allows the memory to be
cleaned up at the end of each block.  R does not have delayed commitment to
the same extent, and garbage collects memory only when full (or asked).  
So at least one issue is how much is in use when memory management occurs,
and both the size and number of R objects is relevant. As memory management
is about to change, I think the only way out is to experiment, and that
includes with the setting of the heap size in R.

> P.S. For completeness, I include below the core of the function I am using;
> comments most welcome.

(If you had had some spaces around operators and indented consistently
I might have been able to comment.  Set options(keep.source=FALSE) and
read this into R and out again to get a consistent layout. You can
then tidy it up in ESS.)

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list