[Rd] ISOdate/ISOdatetime performance suggestions, other date/time questions
Sklyar, Oleg (MI London)
osklyar at maninvestments.com
Thu Apr 10 15:51:55 CEST 2008
Dear list:
working with date/times I have come across a problem that ISOdate and
ISOdatetime are too slow on large vectors of data. I was surprised just
until I looked at the implementation and the man page: "ISOdatetime and
ISOdate are convenience wrappers for strptime". In other terms, they
convert data to character representation first in order to create a
POSIXlt object that is then converted to POSIXct. And POSIXct, i.e. the
number of seconds since 1970, is really what one wants most often.
Obviously this is not a bug, but it is really a suboptimal
implementation of a pretty important function as the example below
shows.
Now my questions are:
- any chance that the implementation can be changed in R (suggested,
well tz needs to be added)?
- is there a better pure-R (no-C) way than that shown below to convert
to POSIXct?
- any idea why in the example below fooling R into thinking a list is
POSIXlt is faster than just creating a POSIXlt by rep or seq? It's not a
huge difference, but still. Unfortunately seq on POSIXlt returns POSIXct
anyway, so the class of 'origin' is set correctly.
- any idea why seq is faster than rep when applied on POSIXct? There is
hardly anything simpler than on double values...
Thanks in advance for your comments,
Oleg
It's common in finance to work with time stamps stored in a form like
%Y%m%d.%H%M%OS, e.g. 20080410.140444 for now, this is what 'ts' in the
example below is:
ts = 1e4*trunc(rnorm(50000,2008,2)) + 1e2*trunc(runif(50000,1,12)) +
trunc(runif(50000,1,28)) + 1e-2*trunc(runif(50000,1,24)) +
1e-4*trunc(runif(50000,1,60)) + 1e-6*runif(50000,1,60)
posix.viaISOdate = function(x) {
date = trunc(x at .Data)
time = round(1e6*x at .Data%%1,2)
rtime = round(time)
z = list(sec=rtime%%1e2 + time%%1,
min=(rtime%/%1e2)%%1e2,
hour=rtime%/%1e4,
mday=date%%100,
mon=(date%/%100)%%100,
year=date%/%10000)
ISOdate(z$year,z$mon,z$mday,z$hour,z$min,z$sec) # to POSIXct
}
## This is just a test of how is it faster to create a long POSIXlt
object
## before another implementations are given
origin = as.POSIXct("1970-01-01")
mean(sapply(1:25,function(i) system.time(
as.POSIXlt(rep(origin,600000))
))[1,])
# [1] 0.3972
mean(sapply(1:25,function(i) system.time(
as.POSIXlt(seq(origin, origin, length.out=600000))
))[1,])
# [1] 0.30528
posix.viaPOSIXlt1 = function(x) {
origin = as.POSIXct("1970-01-01")
z = as.POSIXlt(seq(origin, origin, length.out=length(x)))
date = trunc(x at .Data)
time = round(1e6*x at .Data%%1,2)
rtime = round(time)
z$sec=rtime%%1e2 + time%%1
z$min=(rtime%/%1e2)%%1e2
z$hour=rtime%/%1e4
z$mday=date%%100
z$mon=(date%/%100)%%100-1
z$year=date%/%10000-1900
as.double(z) # to POSIXct
}
posix.vialist = function(x) {
date = trunc(x at .Data)
time = round(1e6*x at .Data%%1,2)
rtime = round(time)
na = rep(0.0,length(x))
z = list(sec=rtime%%1e2 + time%%1,
min=(rtime%/%1e2)%%1e2,
hour=rtime%/%1e4,
mday=date%%100,
mon=(date%/%100)%%100-1,
year=date%/%10000-1900,
wday=na,yday=na,isdst=na)
class(z) = c("POSIXt","POSIXlt")
as.double(z) # to POSIXct
}
v1 = posix.viaISOdate(ts)
v2 = posix.viaPOSIXlt1(ts)
v3 = posix.vialist(ts)
all(v1==v2 & v2==v3)
# [1] TRUE
mean(sapply(1:25,function(i) system.time(
system.time(posix.viaISOdate(ts))
))[1,])
# [1] 1.54244
mean(sapply(1:25,function(i) system.time(
system.time(posix.viaPOSIXlt1(ts))
))[1,])
# [1] 0.37624
mean(sapply(1:25,function(i) system.time(
system.time(posix.vialist(ts))
))[1,])
# [1] 0.35488
sessionInfo()
R version 2.6.2 (2008-02-08)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=C;LC_MO
NETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=
C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATI
ON=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] rcompgen_0.1-17
Dr Oleg Sklyar
Technology Group
Man Investments Ltd
+44 (0)20 7144 3803
osklyar at maninvestments.com
**********************************************************************
The contents of this email are for the named addressee(s) only.
It contains information which may be confidential and privileged.
If you are not the intended recipient, please notify the sender
immediately, destroy this email and any attachments and do not
otherwise disclose or use them. Email transmission is not a
secure method of communication and Man Investments cannot accept
responsibility for the completeness or accuracy of this email or
any attachments. Whilst Man Investments makes every effort to keep
its network free from viruses, it does not accept responsibility
for any computer virus which might be transferred by way of this
email or any attachments. This email does not constitute a request,
offer, recommendation or solicitation of any kind to buy, subscribe,
sell or redeem any investment instruments or to perform other such
transactions of any kind. Man Investments reserves the right to
monitor, record and retain all electronic communications through
its network to ensure the integrity of its systems, for record
keeping and regulatory purposes.
Visit us at: www.maninvestments.com
More information about the R-devel
mailing list