[R] Re: [S] scalability
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sat Mar 27 22:32:35 CET 2004
On Sat, 27 Mar 2004, Patrick Burns wrote:
> I think this is an interesting discussion -- I've learned from both
> Steve's and Brian's comments, and I'm broadening it to R-help
> since I think others will be interested as well.
>
> The problem up for comment is:
>
> result <- apply(array.3D, 1:2, sum)
>
> Where array.3D is 3000 by 300 by 3.
...
> Prof Brian Ripley wrote:
>
> BR> There are almost always pros and cons with these issues. S's sum() is an
> BR> S4 generic whereas R's is internal *unless* you define an S4 method for
> BR> it (which S-PLUS has already done). S needs to create several frames for
> BR> what is a nested set of function calls -- 1280b looks modest for that.
> BR>
> BR> Also, S has an ability to back out calculations that R does not, and that
> BR> costs memory (and can have benefits).
> BR>
> BR> We know there are overheads in making functions generic, especially
> BR> S4-generic, but then there are benefits too. I am not sure designers who
> BR> add features take enough account of the costs.
>
> Using R 1.8.1 (precompiled) on SuSe Linux with a Xeon 2.4GHz and 1G of
> memory:
>
> set.seed(2)
> jja <- array(rnorm(3000*300*3), c(3000, 300, 3))
> gc()
> system.time(jjsa <- apply(jja, 1:2, sum)) # takes 30 seconds
>
> sumS3 <- function(x, ...) UseMethod("sumS3")
> sumS3.default <- function(x, ...) sum(x, ...)
> gc()
> system.time(jjsa3 <- apply(jja, 1:2, sumS3)) # takes 65 seconds
sum is already S3-generic in R, at C level. So a simple wrapper would be
a better test. BTW, repeating this speeds things up quite a bit as the gc
limits get tuned. I get (Athlon 2600) 23-23 secs basic, 23-25 secs for a
simple wrapper and 49 secs for sumS4.
> sumS4 <- function(x, ...) standardGeneric("sumS4")
> setMethod("sumS4", signature(x="numeric"), function(x, ...) sum(x, ...))
> gc()
> system.time(jjsa4 <- apply(jja, 1:2, sumS4)) # takes 58 seconds
>
> Questions:
>
> It looks to me like the penalty for making the functions generic is
> similar to one extra function call. Does the penalty grow as there
> are more methods?
Yes, probably quite a lot. AFAIK there is no caching of selected methods
going on, although it is hard to be sure.
> Are there other types of penalties for making
> a function generic?
Memory usage. If you put gcinfo(T) you will see the cons cell usage
growing during the run.
> Is the test with sumS4 still an unfair comparison with S-PLUS?
Yes, somewhat. You only have one method.
> Are things better with S-PLUS 6.2?
Apparently not. Even calling the default method directly seems very slow.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list