[Rd] ISOdate/ISOdatetime performance suggestions, other date/time questions

Sklyar, Oleg (MI London) osklyar at maninvestments.com
Thu Apr 10 17:32:08 CEST 2008


small correction:

# to ensure 0, although it will be overwritten when assigning hour
origin = as.POSIXct("1970-01-01")-as.numeric(as.POSIXct("1970-01-01")) 

Dr Oleg Sklyar
Technology Group
Man Investments Ltd
+44 (0)20 7144 3803
osklyar at maninvestments.com 

> -----Original Message-----
> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Sklyar, 
> Oleg (MI London)
> Sent: 10 April 2008 14:52
> To: R-devel at r-project.org
> Subject: [Rd] ISOdate/ISOdatetime performance suggestions, 
> other date/time questions
> 
> Dear list:
> 
> working with date/times I have come across a problem that 
> ISOdate and ISOdatetime are too slow on large vectors of 
> data. I was surprised just until I looked at the 
> implementation and the man page: "ISOdatetime and ISOdate are 
> convenience wrappers for strptime". In other terms, they 
> convert data to character representation first in order to 
> create a POSIXlt object that is then converted to POSIXct. 
> And POSIXct, i.e. the number of seconds since 1970, is really 
> what one wants most often.
> 
> Obviously this is not a bug, but it is really a suboptimal 
> implementation of a pretty important function as the example 
> below shows.
> 
> Now my questions are:
> 
> - any chance that the implementation can be changed in R 
> (suggested, well tz needs to be added)?
> - is there a better pure-R (no-C) way than that shown below 
> to convert to POSIXct?
> - any idea why in the example below fooling R into thinking a 
> list is POSIXlt is faster than just creating a POSIXlt by rep 
> or seq? It's not a huge difference, but still. Unfortunately 
> seq on POSIXlt returns POSIXct anyway, so the class of 
> 'origin' is set correctly.
> - any idea why seq is faster than rep when applied on 
> POSIXct? There is hardly anything simpler than on double values...
> 
> Thanks in advance for your comments,
> Oleg
> 
> It's common in finance to work with time stamps stored in a 
> form like %Y%m%d.%H%M%OS, e.g. 20080410.140444 for now, this 
> is what 'ts' in the example below is:
> 
> ts = 1e4*trunc(rnorm(50000,2008,2)) + 1e2*trunc(runif(50000,1,12)) + 
>      trunc(runif(50000,1,28)) + 1e-2*trunc(runif(50000,1,24)) +
>      1e-4*trunc(runif(50000,1,60)) + 1e-6*runif(50000,1,60)
> 
> posix.viaISOdate = function(x) {
>     date = trunc(x at .Data)
>     time = round(1e6*x at .Data%%1,2)
>     rtime = round(time)
>     z = list(sec=rtime%%1e2 + time%%1,
>             min=(rtime%/%1e2)%%1e2,
>             hour=rtime%/%1e4,
>             mday=date%%100,
>             mon=(date%/%100)%%100,
>             year=date%/%10000)
>     ISOdate(z$year,z$mon,z$mday,z$hour,z$min,z$sec) # to POSIXct }
> 
> ## This is just a test of how is it faster to create a long 
> POSIXlt object ## before another implementations are given
> 
> origin = as.POSIXct("1970-01-01") 
> 
> mean(sapply(1:25,function(i) system.time(
>     as.POSIXlt(rep(origin,600000))
> ))[1,])
> # [1] 0.3972
> 
> mean(sapply(1:25,function(i) system.time(
>     as.POSIXlt(seq(origin, origin, length.out=600000))
> ))[1,])
> # [1] 0.30528
> 
> 
> posix.viaPOSIXlt1 = function(x) {
>     origin = as.POSIXct("1970-01-01") 
>     z = as.POSIXlt(seq(origin, origin, length.out=length(x)))
>     date = trunc(x at .Data)
>     time = round(1e6*x at .Data%%1,2)
>     rtime = round(time)
>     z$sec=rtime%%1e2 + time%%1
>     z$min=(rtime%/%1e2)%%1e2
>     z$hour=rtime%/%1e4
>     z$mday=date%%100
>     z$mon=(date%/%100)%%100-1
>     z$year=date%/%10000-1900
>     as.double(z) # to POSIXct
> }
> 
> posix.vialist = function(x) {
>     date = trunc(x at .Data)
>     time = round(1e6*x at .Data%%1,2)
>     rtime = round(time)
>     na = rep(0.0,length(x))
>     z = list(sec=rtime%%1e2 + time%%1,
>         min=(rtime%/%1e2)%%1e2,
>         hour=rtime%/%1e4,
>         mday=date%%100,
>         mon=(date%/%100)%%100-1,
>         year=date%/%10000-1900,
>         wday=na,yday=na,isdst=na)
>     class(z) = c("POSIXt","POSIXlt")
>     as.double(z) # to POSIXct
> }
> 
> v1 = posix.viaISOdate(ts)
> v2 = posix.viaPOSIXlt1(ts)
> v3 = posix.vialist(ts)
> 
> all(v1==v2 & v2==v3)
> # [1] TRUE
> 
> mean(sapply(1:25,function(i) system.time(
>     system.time(posix.viaISOdate(ts))
> ))[1,])
> # [1] 1.54244
> 
> mean(sapply(1:25,function(i) system.time(
>     system.time(posix.viaPOSIXlt1(ts))
> ))[1,])
> # [1] 0.37624
> 
> mean(sapply(1:25,function(i) system.time(
>     system.time(posix.vialist(ts))
> ))[1,])
> # [1] 0.35488
> 
> 
> 
> 
> sessionInfo()
> R version 2.6.2 (2008-02-08)
> x86_64-unknown-linux-gnu
> 
> locale:
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLA
> TE=C;LC_MO
> NETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-
> 8;LC_NAME=
> C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_ID
> ENTIFICATI
> ON=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> loaded via a namespace (and not attached):
> [1] rcompgen_0.1-17
> 
> Dr Oleg Sklyar
> Technology Group
> Man Investments Ltd
> +44 (0)20 7144 3803
> osklyar at maninvestments.com
> 
> 
> **********************************************************************
> The contents of this email are for the named addressee(s) only.
> It contains information which may be confidential and privileged.
> If you are not the intended recipient, please notify the 
> sender immediately, destroy this email and any attachments 
> and do not otherwise disclose or use them. Email transmission 
> is not a secure method of communication and Man Investments 
> cannot accept responsibility for the completeness or accuracy 
> of this email or any attachments. Whilst Man Investments 
> makes every effort to keep its network free from viruses, it 
> does not accept responsibility for any computer virus which 
> might be transferred by way of this email or any attachments. 
> This email does not constitute a request, offer, 
> recommendation or solicitation of any kind to buy, subscribe, 
> sell or redeem any investment instruments or to perform other 
> such transactions of any kind. Man Investments reserves the 
> right to monitor, record and retain all electronic 
> communications through its network to ensure the integrity of 
> its systems, for record keeping and regulatory purposes. 
> 
> Visit us at: www.maninvestments.com
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list