[R] applying data generating function
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Mar 8 18:33:54 CET 2004
On Mon, 8 Mar 2004, Spencer Graves wrote:
> With "gc()" right before each call to "proc.time", as Brian Ripley
> and Gabor Grothendieck suggested, the times were substantially more
> stable. For the for loop, extending the vector with each of 1e5
> iterations, I got 181.25, 181.27, 182.72, 182.44, and 182.56. The
> averages of the last 3 of these tests are as follows:
>
> 10 100 1000 10000 1e+05
> for loop 0 0.01 0.05 1.13 182.14
> gen e + for loop 0 0.00 0.03 0.26 2.58
> create storage + for loop 0 0.00 0.04 0.39 3.94
> sapply 0 0.00 0.03 0.32 4.05
> replicate 0 0.00 0.03 0.31 3.55
>
> Without "gc()", I got 192.05, 182.02, 126.04, 130.30, and 118.64 for
> extending the vector with each for loop iteration.
>
> Three more observations about this:
>
> 1. Without "gc()", the times started higher but declined by
> roughly a third. This suggests that R may actually be storing
> intermediate "semi-compiled" code in "garbage" and using it when the
> situation warrants -- but "gc()" discards it.
I don't see anything in the code to allow for that possibility.
I believe it's down to the vagarities of garbage collection, and in
particular how the tuning of limits and the mix of level 0,1,2 gc's gets
adjusted during runs. Here is a small experimental setup:
foo <- function(N)
{
set.seed(123)
gct <- gc.time()
res <- system.time({
f<-function (x.) { 3.8*x.*(1-x.) + rnorm(1,0,.001) }
v=c()
x=.1 # starting point
for (i in 1:N) { x=f(x); v=append(v,x) }
})
gct <- gc.time() - gct
cbind(res, gct)
}
gc.time(TRUE)
gc()
> foo(1e4)
res gct
[1,] 1.39 1.12
[2,] 0.01 0.92
[3,] 1.41 1.07
[4,] 0.00 0.00
[5,] 0.00 0.00
> foo(1e5)
res gct
[1,] 218.68 242.86
[2,] 19.98 162.12
[3,] 238.83 246.10
[4,] 0.00 0.00
[5,] 0.00 0.00
so most (if not more than all) of the time is going on garbage collection,
something like 18000 gc's in the second run.
> 2. Increasing N from 1e4 to 1e5 increased the time NOT by a
> factor of 10 but by a factor of 161 = 182/1.13 when the length of the
> vector was extended in each iteration.
Right, but 9/10 of those additional allocations/garbage collections are of
longer objects than before and so that will take more time. In
particular, objects of non-small size are directly allocated and freed, so
this will also depend on the speed of your malloc. How the time to alloc
n bytes depends on n will be very system-specific.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list