[R] Quirks with system.time and simulations
Roger D. Peng
rpeng at jhsph.edu
Mon Jun 14 02:50:45 CEST 2004
I think the first time is potentially much slower because of a
garbage collection. R-devel has a flag `gcFirst' for
system.time() which (I think) forces a garbage collection before
timing.
-roger
Patrick Connolly wrote:
> I tried the code that Richard O'Keefe posted last week, to wit:
>
> library(chron)
> ymd.to.POSIXlt <-
> function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
> n <- 100000
> y <- sample(1970:2004, n, replace=TRUE)
> m <- sample(1:12, n, replace=TRUE)
> d <- sample(1:28, n, replace=TRUE)
> system.time(ymd.to.POSIXlt(y, m, d))
> [1] 8.78 0.10 31.76 0.00 0.00
> system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> [1] 14.64 0.13 53.30 0.00 0.00
>
>
> On a somewhat newer machine, I got
>
> $ R --vanilla
>
> R : Copyright 2004, The R Foundation for Statistical Computing
> Version 1.9.0 (2004-04-12), ISBN 3-900051-00-3
>
> [...]
>
>
>
>>library(chron)
>> ymd.to.POSIXlt <-
>
> + function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
>
>> n <- 100000
>> y <- sample(1970:2004, n, replace=TRUE)
>> m <- sample(1:12, n, replace=TRUE)
>> d <- sample(1:28, n, replace=TRUE)
>>
>>system.time(ymd.to.POSIXlt(y, m, d))
>
> [1] 1.67 0.24 2.01 0.00 0.00
>
>>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
>
> [1] 3.06 0.02 3.08 0.00 0.00
>
>
> But then I tried a few more times...
>
>
>>system.time(ymd.to.POSIXlt(y, m, d))
>
> [1] 1.09 0.04 1.13 0.00 0.00
>
>>system.time(ymd.to.POSIXlt(y, m, d))
>
> [1] 1.11 0.09 1.20 0.00 0.00
>
>
> The second time is a lot faster, but subsequent ones don't "improve further".
> '
> But with the "standard" function,
>
>
>>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
>
> [1] 2.64 0.02 2.66 0.00 0.00
>
>>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
>
> [1] 2.82 0.03 2.85 0.00 0.00
>
> ... it does improve slightly but rather a lot less.
>
>
> THEN
>
> If I compare the two methods in the reverse order,
>
>
> $ R --vanilla
>
> R : Copyright 2004, The R Foundation for Statistical Computing
> Version 1.9.0 (2004-04-12), ISBN 3-900051-00-3
>
> [....]
>
>
>
>>library(chron)
>> ymd.to.POSIXlt <-
>
> + function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
>
>> n <- 100000
>> y <- sample(1970:2004, n, replace=TRUE)
>> m <- sample(1:12, n, replace=TRUE)
>> d <- sample(1:28, n, replace=TRUE)
>>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
>
> [1] 3.66 0.02 3.76 0.00 0.00
>
>>system.time(ymd.to.POSIXlt(y, m, d))
>
> [1] 1.65 0.05 1.70 0.00 0.00
>
>>
>>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
>
> [1] 2.59 0.02 2.61 0.00 0.00
>
>>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
>
> [1] 2.73 0.00 2.74 0.00 0.00
>
>>system.time(ymd.to.POSIXlt(y, m, d))
>
> [1] 1.29 0.01 1.30 0.00 0.00
>
>>system.time(ymd.to.POSIXlt(y, m, d))
>
> [1] 0.94 0.00 0.94 0.00 0.00
>
>>system.time(ymd.to.POSIXlt(y, m, d))
>
> [1] 1.06 0.01 1.07 0.00 0.00
>
>
>
> It seems as though the first simulation makes it "easier" for
> subsequent simulations of the same type AND also for simulations of a
> somewhat different type also. The degree to which it "helps" varies
> according to just what is being run (no surprise there). What I can't
> figure out is what is happening that makes it quicker for second and
> subsequent runs.
>
> I even tried doing a gc() and setting seeds before each run to make a
> more direct comparison, but it made no difference other than being
> slightly less variable. I have seen a similar phenomenon in other
> types of simulations.
>
> In the case of this code, it makes no difference whether n is 100 or
> 10000000. Would that be attibutable to lazy evaluation?
>
>
>
>>version
>
> _
> platform i686-pc-linux-gnu
> arch i686
> os linux-gnu
> system i686, linux-gnu
> status
> major 1
> minor 9.0
> year 2004
> month 04
> day 12
> language R
>
>
> It's not exactly a problem, but it could have a bearing on comparing
> processing times which is something that happens from time to time.
> In the comparison that gave rise to the code above, the order would
> have made a substantial difference to the perceived effectiveness of
> Richard's code.
>
>
--
Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng/
More information about the R-help
mailing list