[R] Quirks with system.time and simulations
Liaw, Andy
andy_liaw at merck.com
Mon Jun 14 03:24:28 CEST 2004
I wonder if there's also effect of cpu cache...
Andy
> From: Roger D. Peng
>
> I think the first time is potentially much slower because of a
> garbage collection. R-devel has a flag `gcFirst' for
> system.time() which (I think) forces a garbage collection before
> timing.
>
> -roger
>
> Patrick Connolly wrote:
> > I tried the code that Richard O'Keefe posted last week, to wit:
> >
> > library(chron)
> > ymd.to.POSIXlt <-
> > function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
> > n <- 100000
> > y <- sample(1970:2004, n, replace=TRUE)
> > m <- sample(1:12, n, replace=TRUE)
> > d <- sample(1:28, n, replace=TRUE)
> > system.time(ymd.to.POSIXlt(y, m, d))
> > [1] 8.78 0.10 31.76 0.00 0.00
> > system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> > [1] 14.64 0.13 53.30 0.00 0.00
> >
> >
> > On a somewhat newer machine, I got
> >
> > $ R --vanilla
> >
> > R : Copyright 2004, The R Foundation for Statistical Computing
> > Version 1.9.0 (2004-04-12), ISBN 3-900051-00-3
> >
> > [...]
> >
> >
> >
> >>library(chron)
> >> ymd.to.POSIXlt <-
> >
> > + function (y, m, d) as.POSIXlt(chron(julian(y=y,
> x=m, d=d)))
> >
> >> n <- 100000
> >> y <- sample(1970:2004, n, replace=TRUE)
> >> m <- sample(1:12, n, replace=TRUE)
> >> d <- sample(1:28, n, replace=TRUE)
> >>
> >>system.time(ymd.to.POSIXlt(y, m, d))
> >
> > [1] 1.67 0.24 2.01 0.00 0.00
> >
> >>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> >
> > [1] 3.06 0.02 3.08 0.00 0.00
> >
> >
> > But then I tried a few more times...
> >
> >
> >>system.time(ymd.to.POSIXlt(y, m, d))
> >
> > [1] 1.09 0.04 1.13 0.00 0.00
> >
> >>system.time(ymd.to.POSIXlt(y, m, d))
> >
> > [1] 1.11 0.09 1.20 0.00 0.00
> >
> >
> > The second time is a lot faster, but subsequent ones don't
> "improve further".
> > '
> > But with the "standard" function,
> >
> >
> >>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> >
> > [1] 2.64 0.02 2.66 0.00 0.00
> >
> >>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> >
> > [1] 2.82 0.03 2.85 0.00 0.00
> >
> > ... it does improve slightly but rather a lot less.
> >
> >
> > THEN
> >
> > If I compare the two methods in the reverse order,
> >
> >
> > $ R --vanilla
> >
> > R : Copyright 2004, The R Foundation for Statistical Computing
> > Version 1.9.0 (2004-04-12), ISBN 3-900051-00-3
> >
> > [....]
> >
> >
> >
> >>library(chron)
> >> ymd.to.POSIXlt <-
> >
> > + function (y, m, d) as.POSIXlt(chron(julian(y=y,
> x=m, d=d)))
> >
> >> n <- 100000
> >> y <- sample(1970:2004, n, replace=TRUE)
> >> m <- sample(1:12, n, replace=TRUE)
> >> d <- sample(1:28, n, replace=TRUE)
> >>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> >
> > [1] 3.66 0.02 3.76 0.00 0.00
> >
> >>system.time(ymd.to.POSIXlt(y, m, d))
> >
> > [1] 1.65 0.05 1.70 0.00 0.00
> >
> >>
> >>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> >
> > [1] 2.59 0.02 2.61 0.00 0.00
> >
> >>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> >
> > [1] 2.73 0.00 2.74 0.00 0.00
> >
> >>system.time(ymd.to.POSIXlt(y, m, d))
> >
> > [1] 1.29 0.01 1.30 0.00 0.00
> >
> >>system.time(ymd.to.POSIXlt(y, m, d))
> >
> > [1] 0.94 0.00 0.94 0.00 0.00
> >
> >>system.time(ymd.to.POSIXlt(y, m, d))
> >
> > [1] 1.06 0.01 1.07 0.00 0.00
> >
> >
> >
> > It seems as though the first simulation makes it "easier" for
> > subsequent simulations of the same type AND also for
> simulations of a
> > somewhat different type also. The degree to which it "helps" varies
> > according to just what is being run (no surprise there).
> What I can't
> > figure out is what is happening that makes it quicker for second and
> > subsequent runs.
> >
> > I even tried doing a gc() and setting seeds before each run
> to make a
> > more direct comparison, but it made no difference other than being
> > slightly less variable. I have seen a similar phenomenon in other
> > types of simulations.
> >
> > In the case of this code, it makes no difference whether n is 100 or
> > 10000000. Would that be attibutable to lazy evaluation?
> >
> >
> >
> >>version
> >
> > _
> > platform i686-pc-linux-gnu
> > arch i686
> > os linux-gnu
> > system i686, linux-gnu
> > status
> > major 1
> > minor 9.0
> > year 2004
> > month 04
> > day 12
> > language R
> >
> >
> > It's not exactly a problem, but it could have a bearing on comparing
> > processing times which is something that happens from time to time.
> > In the comparison that gave rise to the code above, the order would
> > have made a substantial difference to the perceived effectiveness of
> > Richard's code.
> >
> >
>
> --
> Roger D. Peng
> http://www.biostat.jhsph.edu/~rpeng/
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list