[Rd] ISOdate/ISOdatetime performance suggestions, other date/time questions
Sklyar, Oleg (MI London)
osklyar at maninvestments.com
Thu Apr 10 17:32:08 CEST 2008
small correction:
# to ensure 0, although it will be overwritten when assigning hour
origin = as.POSIXct("1970-01-01")-as.numeric(as.POSIXct("1970-01-01"))
Dr Oleg Sklyar
Technology Group
Man Investments Ltd
+44 (0)20 7144 3803
osklyar at maninvestments.com
> -----Original Message-----
> From: r-devel-bounces at r-project.org
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Sklyar,
> Oleg (MI London)
> Sent: 10 April 2008 14:52
> To: R-devel at r-project.org
> Subject: [Rd] ISOdate/ISOdatetime performance suggestions,
> other date/time questions
>
> Dear list:
>
> working with date/times I have come across a problem that
> ISOdate and ISOdatetime are too slow on large vectors of
> data. I was surprised just until I looked at the
> implementation and the man page: "ISOdatetime and ISOdate are
> convenience wrappers for strptime". In other terms, they
> convert data to character representation first in order to
> create a POSIXlt object that is then converted to POSIXct.
> And POSIXct, i.e. the number of seconds since 1970, is really
> what one wants most often.
>
> Obviously this is not a bug, but it is really a suboptimal
> implementation of a pretty important function as the example
> below shows.
>
> Now my questions are:
>
> - any chance that the implementation can be changed in R
> (suggested, well tz needs to be added)?
> - is there a better pure-R (no-C) way than that shown below
> to convert to POSIXct?
> - any idea why in the example below fooling R into thinking a
> list is POSIXlt is faster than just creating a POSIXlt by rep
> or seq? It's not a huge difference, but still. Unfortunately
> seq on POSIXlt returns POSIXct anyway, so the class of
> 'origin' is set correctly.
> - any idea why seq is faster than rep when applied on
> POSIXct? There is hardly anything simpler than on double values...
>
> Thanks in advance for your comments,
> Oleg
>
> It's common in finance to work with time stamps stored in a
> form like %Y%m%d.%H%M%OS, e.g. 20080410.140444 for now, this
> is what 'ts' in the example below is:
>
> ts = 1e4*trunc(rnorm(50000,2008,2)) + 1e2*trunc(runif(50000,1,12)) +
> trunc(runif(50000,1,28)) + 1e-2*trunc(runif(50000,1,24)) +
> 1e-4*trunc(runif(50000,1,60)) + 1e-6*runif(50000,1,60)
>
> posix.viaISOdate = function(x) {
> date = trunc(x at .Data)
> time = round(1e6*x at .Data%%1,2)
> rtime = round(time)
> z = list(sec=rtime%%1e2 + time%%1,
> min=(rtime%/%1e2)%%1e2,
> hour=rtime%/%1e4,
> mday=date%%100,
> mon=(date%/%100)%%100,
> year=date%/%10000)
> ISOdate(z$year,z$mon,z$mday,z$hour,z$min,z$sec) # to POSIXct }
>
> ## This is just a test of how is it faster to create a long
> POSIXlt object ## before another implementations are given
>
> origin = as.POSIXct("1970-01-01")
>
> mean(sapply(1:25,function(i) system.time(
> as.POSIXlt(rep(origin,600000))
> ))[1,])
> # [1] 0.3972
>
> mean(sapply(1:25,function(i) system.time(
> as.POSIXlt(seq(origin, origin, length.out=600000))
> ))[1,])
> # [1] 0.30528
>
>
> posix.viaPOSIXlt1 = function(x) {
> origin = as.POSIXct("1970-01-01")
> z = as.POSIXlt(seq(origin, origin, length.out=length(x)))
> date = trunc(x at .Data)
> time = round(1e6*x at .Data%%1,2)
> rtime = round(time)
> z$sec=rtime%%1e2 + time%%1
> z$min=(rtime%/%1e2)%%1e2
> z$hour=rtime%/%1e4
> z$mday=date%%100
> z$mon=(date%/%100)%%100-1
> z$year=date%/%10000-1900
> as.double(z) # to POSIXct
> }
>
> posix.vialist = function(x) {
> date = trunc(x at .Data)
> time = round(1e6*x at .Data%%1,2)
> rtime = round(time)
> na = rep(0.0,length(x))
> z = list(sec=rtime%%1e2 + time%%1,
> min=(rtime%/%1e2)%%1e2,
> hour=rtime%/%1e4,
> mday=date%%100,
> mon=(date%/%100)%%100-1,
> year=date%/%10000-1900,
> wday=na,yday=na,isdst=na)
> class(z) = c("POSIXt","POSIXlt")
> as.double(z) # to POSIXct
> }
>
> v1 = posix.viaISOdate(ts)
> v2 = posix.viaPOSIXlt1(ts)
> v3 = posix.vialist(ts)
>
> all(v1==v2 & v2==v3)
> # [1] TRUE
>
> mean(sapply(1:25,function(i) system.time(
> system.time(posix.viaISOdate(ts))
> ))[1,])
> # [1] 1.54244
>
> mean(sapply(1:25,function(i) system.time(
> system.time(posix.viaPOSIXlt1(ts))
> ))[1,])
> # [1] 0.37624
>
> mean(sapply(1:25,function(i) system.time(
> system.time(posix.vialist(ts))
> ))[1,])
> # [1] 0.35488
>
>
>
>
> sessionInfo()
> R version 2.6.2 (2008-02-08)
> x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLA
> TE=C;LC_MO
> NETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-
> 8;LC_NAME=
> C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_ID
> ENTIFICATI
> ON=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] rcompgen_0.1-17
>
> Dr Oleg Sklyar
> Technology Group
> Man Investments Ltd
> +44 (0)20 7144 3803
> osklyar at maninvestments.com
>
>
> **********************************************************************
> The contents of this email are for the named addressee(s) only.
> It contains information which may be confidential and privileged.
> If you are not the intended recipient, please notify the
> sender immediately, destroy this email and any attachments
> and do not otherwise disclose or use them. Email transmission
> is not a secure method of communication and Man Investments
> cannot accept responsibility for the completeness or accuracy
> of this email or any attachments. Whilst Man Investments
> makes every effort to keep its network free from viruses, it
> does not accept responsibility for any computer virus which
> might be transferred by way of this email or any attachments.
> This email does not constitute a request, offer,
> recommendation or solicitation of any kind to buy, subscribe,
> sell or redeem any investment instruments or to perform other
> such transactions of any kind. Man Investments reserves the
> right to monitor, record and retain all electronic
> communications through its network to ensure the integrity of
> its systems, for record keeping and regulatory purposes.
>
> Visit us at: www.maninvestments.com
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list