[R] Problem(?) in strptime()
Don MacQueen
macq at llnl.gov
Mon Apr 8 21:33:38 CEST 2002
I think the following examples illustrate the crux of the matter
(version and OS info are below).
The problem has to do with the transition from standard time to
daylight savings time. My timezone, US/Pacific, has two parts:
standard time (PST) 8 hours behind GMT and daylight savings time
(PDT) 7 hours behind GMT. The transition takes place this year on 7
April at 02:00, when 02:00 is re-labeled 03:00.
## April 6, 01:30 and 02:30
> ISOdatetime(2002, 4, 6, 1:2, 30, 0,tz='GMT')
[1] "2002-04-05 17:30:00 PST" "2002-04-05 18:30:00 PST"
## April 7 , 01:30 and 02:30
> ISOdatetime(2002, 4, 7, 1:2, 30, 0,tz='GMT')
[1] "2002-04-06 17:30:00 PST" "2002-04-06 17:30:00 PST"
The dates supplied are one day apart. The times supplied are one hour
apart. However, the times returned are one hour apart in the first
case, but identical in the second case.
> tmp <- ISOdatetime(2002, 4, 7, 1:2, 30, 0,tz='GMT')
> identical(tmp[1],tmp[2])
[1] TRUE
Of the four values returned, the last one is incorrect, because
2002-4-7 2:30 GMT truly is 2002-4-6 18:30 PST.
What I need is a way to have that fourth case interpreted correctly.
Investigating a bit:
> ISOdatetime
function (year, month, day, hour, min, sec, tz = "")
{
x <- paste(year, month, day, hour, min, sec)
as.POSIXct(strptime(x, "%Y %m %d %H %M %S"), tz = tz)
}
> strptime
function (x, format)
.Internal(strptime(x, format))
ISOdatetime() uses strptime(), and strptime() does not use the
timezone information. Indeed, from ?strptime, TZ as part of a format
specification is available for output only.
As far as I can tell, strptime() interprets everything in the local
timezone, and when provided a time such as 2002-4-7 2:30 that
"doesn't exist" in the local timezone, makes a reasonable attempt to
guess what the user meant. But it doesn't work for what ISOdatetime()
does with it when tz is something other than ''.
What I need is a way to tell R that the date-time string really truly
should be interpreted as GMT. I haven't found a way. (maybe setenv TZ
GMT before starting R, but I'm still exploring that)
Also as far as I can tell, strptime() uses an OS-supplied strptime if
one is available, and R is entirely dependent on its behavior. I
don't entirely understand what man strptime on my system says about
this, but maybe it suggests that timezone information might be used
if provided...
%Z Timezone name or no characters if no time zone infor-
mation exists. Local timezone information is used as
though strptime() called tzset() (see ctime(3C)).
Errors may not be detected. This behavior is subject
to change in a future release.
> Sys.getlocale()
[1] "C"
> version
> Sys.getenv('TZ')
TZ
"US/Pacific"
_
platform sparc-sun-solaris2.7
arch sparc
os solaris2.7
system sparc, solaris2.7
status
major 1
minor 4.1
year 2002
month 01
day 30
language R
I tried changing the locale
Sys.setlocale('LC_TIME','en_GB')
(based on entries in /usr/lib/locale/lcttab), and
Sys.putenv('TZ=GMT')
to no avail.
----------------------------------------------
This whole thing is motivated by the fact that I am receiving some
data that is time-stamped, and the time stamps (in addition to having
a poorly chosen format) ignore the daylight savings time convention.
That is, they always use an 8 hour offset from GMT. Thus, the three
times shown are in fact an hourly sequence.
Sun Apr 07 01:30:58 2002
Sun Apr 07 02:30:58 2002
Sun Apr 07 03:30:58 2002
In order to convert these correctly to POSIXct, I thought a
reasonable approach would be to tell R that they are in GMT, read
them as such, and then convert to US/Pacific.
Here is what I have been using.
tmpd <- c('Sun Apr 07 01:30:58 2002',
'Sun Apr 07 02:30:58 2002',
'Sun Apr 07 03:30:58 2002')
tmpt <- as.POSIXct(strptime(tmpd,'%a %b %d %H:%M:%S %Y'),tz='GMT')+28800
> tmpt
[1] "2002-04-07 01:30:58 PST" "2002-04-07 01:30:58 PST" "2002-04-07
04:30:58 PDT"
It works for the first and last times, but not the middle one
(3:30 "PST" = 4:30 PDT is correct, but 2:30 "PST" should be 3:30 PDT).
I would appreciate help finding a way that works for all of them
simultaneously.
Thanks
-Don
-----------------
Here are some more attempts at various ways of looking at these
dates, if anyone cares to wade through them.
> strptime('2002-4-7 1:30' , '%Y-%m-%d %H:%M')
[1] "2002-04-07 01:30:00"
> strptime('2002-4-7 2:30' , '%Y-%m-%d %H:%M')
[1] "2002-04-07 01:30:00"
> strptime('2002-4-7 3:30' , '%Y-%m-%d %H:%M')
[1] "2002-04-07 03:30:00"
The first and last display as two hours apart.
The second one is interpreted by strptime() to be the same as the
first one. Not unreasonable, but problematic as illustrated above.
-------
> as.numeric(as.POSIXct(strptime('2002-4-7 1:30' , '%Y-%m-%d %H:%M')))
[1] 1018171800
> as.numeric(as.POSIXct(strptime('2002-4-7 2:30' , '%Y-%m-%d %H:%M')))
[1] 1018171800
> as.numeric(as.POSIXct(strptime('2002-4-7 3:30' , '%Y-%m-%d %H:%M')))
[1] 101817540
> 1018175400 - 1018171800
[1] 3600
But in fact, the first and last are only one hour apart. This is
correct, because the first one is PST, the third one is PDT.
-------
> as.POSIXct(strptime('2002-4-7 1:30' , '%Y-%m-%d %H:%M'),tz='GMT')
[1] "2002-04-06 17:30:00 PST"
> as.POSIXct(strptime('2002-4-7 2:30' , '%Y-%m-%d %H:%M'),tz='GMT')
[1] "2002-04-06 17:30:00 PST"
> as.POSIXct(strptime('2002-4-7 3:30' , '%Y-%m-%d %H:%M'),tz='GMT')
[1] "2002-04-06 19:30:00 PST"
> as.POSIXct(strptime('2002-4-7 1:30' , '%Y-%m-%d %H:%M'),tz='US/Pacific')
[1] "2002-04-07 01:30:00 PST"
> as.POSIXct(strptime('2002-4-7 2:30' , '%Y-%m-%d %H:%M'),tz='US/Pacific')
[1] "2002-04-07 01:30:00 PST"
> as.POSIXct(strptime('2002-4-7 3:30' , '%Y-%m-%d %H:%M'),tz='US/Pacific')
[1] "2002-04-07 03:30:00 PDT"
--
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
--------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list