[R] From POSIXct to numeric and back with time zone

David Winsemius dwinsemius at comcast.net
Fri Aug 23 19:12:03 CEST 2013


On Aug 23, 2013, at 3:12 AM, Daniel Haugstvedt wrote:

> I am replying to my own question in case someone else finds this tread and needs help with the same problem. Thanks to Mark Leeds for helping me on my way. Any errors or flaws are mine since I have rewritten most of his comments to make sure I understood them correctly.  
> 
> First three general recommendations for time zone problems:
> 
> 1) When asking time zone related questions always give OS information. It does not hurt to give information on version etc. either. My system is OSX, lion (10.8.4). Using str(R.Version()) to get system information is one option.
> 
> str(R.Version())
> List of 14
> $ platform      : chr "x86_64-apple-darwin9.8.0"
> $ arch          : chr "x86_64"
> $ os            : chr "darwin9.8.0"
> $ system        : chr "x86_64, darwin 9.8.0"
> .
> .
> .
> $ version.string: chr "R version 2.15.1 (2012-06-22)"
> $ nickname      : chr "Roasted Marshmallows
> 
> 2)  Before you do ANYTHING with timezones, put Sys.setenv(TZ = "UTC") in your .Rprofile ir at the tiop of the code you're working in.  Otherwise, if you start trying to convert date time objects to plain date objects, things can really get whacked. 
> 
> 3) Check that the time zone you are using is valid. 
> 
> I am no expert on this, but from what I understand, in OSX a valid time zone has the name of one of the files in the folder
> 
> /usr/share/zoneinfo,
> 
> with some obvious exceptions like the files "iso3166.tab", "posixrules" and "zone.tab". It can also be one of the entries in the file, /usr/share/zoneinfo/zone.tab.
> 
> CET and CEST (daylight savings time) are the time zone my system use when nothing is specified. I am sorry for writing ETC in one of the lines in the first email.
> 
> 
> Now, to the problem: How do I change from POSIXct to numeric and back with another time zone than UTC?
> 
> I have tried to simplify the original question and attempted and answer. Please correct me if I am wrong.
> 
> 
> 
> 
> Sys.setenv(TZ = "UTC")
> 
> ## Number of seconds from '1970-01-01 00:00:00 UTC' to '2000-01-30 00:00:00 CET' not 
> ## counting leap seconds. Display as CET date
> tmp = as.POSIXct( '2000-01-30', origin = '1970-01-01' , tz = "CET")
> 
> ## Number of seconds from '1970-01-01 00:00:00 UTC' to '2000-01-30 00:00:00 CET' not 
> ## counting leap seconds. Display as UTC date
> tmp2 =as.POSIXct( as.numeric( tmp ),origin = '1970-01-01' , tz = "UTC")
> 
> ## What I wanted was to go to numeric and back to the original with the same time zone. What I got was
> ## the number of seconds from '1970-01-01 00:00:00 UTC' to '2000-01-30 00:00:00 UTC' not 
> ## counting leap seconds. Display as CET date. Which is 60*60 seconds less then I expect.
> tmp3 = as.POSIXct( as.numeric( tmp ),origin = '1970-01-01' , tz = "CET")
> 
> ## Solution: Convert to the desired time zone after as.POSIXct has been used wit UTC to get the 
> ## correct number of seconds
> tmp4 = tmp2
> attributes(tmp4)$tzone = 'CET'
> 
> 
> tmp
> [1] "2000-01-30 CET"
>> tmp2
> [1] "2000-01-29 23:00:00 UTC"
>> tmp3
> [1] "2000-01-29 23:00:00 CET"
>> tmp4
> [1] "2000-01-30 CET"
>> 
>> as.numeric(tmp)
> [1] 949186800
>> as.numeric(tmp2)
> [1] 949186800
>> as.numeric(tmp3)
> [1] 949183200
>> as.numeric(tmp4)
> [1] 949186800
> 
> 
> My conclusions are 
> 1) The tz argument sets the tzone attribute but it also determines how the entered date should be interpreted IF the date is entered as a string. 
> 2) If the date is entered as numeric it is assumed to be the number of seconds from UTC to UTC and the tz argument is used to add / subtract the number of seconds which converts it to the time zone specified.

Thank you for this discussion (and I share your pain.) You might want to look at the various `as.POSIXct` methods with:

methods(as.POSIXct)

 I see 10 methods on my machine at the moment, but some are from the zoo package, so you may see a different number. You are describing differences in how `as.POSIXct.numeric` behaves versus `as.POSIXct.default` which I believe is where a character or factor argument ends up after first being passed through `as.POSIXlt`. In other parts of your question the behavior of `as.POSIXct.Date`, `as.POSIXct.dates`, and `as.POSIXct.default` are illustrated. I have had similar difficulties understanding TZ behavior. It would be nice if R automagically looked up the current setting of my system timeszone. 

> 
> Some additional conclusions that I came across while testing a bit. The code which made me draw them are attached at the end. 
> 3) If a time zone is not needed the tz argument does nothing. It sets the tzone but it does not change it.
> 4) The origin is assumed to be UTC regardless of what Sys.timezone() say as long as no time zone for the origin is specified. I checked this by changing the Sys.timezone() to CET before running the example again. 

My understanding: All items in a vector need to be in the same TZ. You cannot mix the TZ argument within a vector. Hence, it may be better to always convert to UTC for storage (and for mental clarity) and only use TZ's for the format.POSIXt output.

I am not able to figure out where the canonical translation for TZ abbreviations lies on my MacOS 10.8.5 machine. The file at:

?timezone
Using the example code with:
tzfile <- "/usr/share/zoneinfo/zone.tab"

... does not have any three letter abbreviations. My value for TZ is numbered 390, with a name of  "America/Los_Angeles". My Sys.time function behaves properly:

> Sys.time()
[1] "2013-08-23 09:54:58 PDT"

But: "How do it know?"

If I change my system TZ to US Central Daylight time there is no apparent recognition of that fact in the output of Sys.time() ... I think I may need to change a locale variable. If I enter a TZ with Sys.setenv() I get CDT as the output:

> Sys.setenv(TZ = "America/Chicago")
> Sys.time()
[1] "2013-08-23 12:00:11 CDT"

So I can change R's understanding of the local TZ if I use the full tzone$name value but offering a value of either PDT or PST will fail for the Sys.setenv(TZ=) argument. (Sys.getenv('TZ') returns "" when I start R.)

Best of luck in this quest and thanks for the illuminating exercises. (And I thought your spelling was as good as most native English speakers, certainly better than I generally exhibit, so I think you shoudl stop apologizing.)

-- 
David.

> 
> Best regards
> 
> Daniel Haugstvedt
> Ph.d student
> NTNU, Trondheim, Norway
> 
> 
> ## If a time zone is not needed the tz argument does nothing. It sets the tzone but it does not change it.
> Sys.setenv(TZ = "UTC")
> tmp = as.POSIXct( '2000-01-30', origin = '1970-01-01' , tz = "CET")
> tmp2 = as.POSIXct(tmp, tz = "CET")
> 
>> tmp
> [1] "2000-01-30 CET"
>> tmp2
> [1] "2000-01-30 CET"
> 
> 
> ## Sys.setenv does not change the time zone of the origin
> Sys.setenv(TZ = "CET")
> 
> tmp5 = as.POSIXct( '2000-01-30', origin = '1970-01-01' , tz = "CET")
> tmp6 =as.POSIXct( as.numeric( tmp5 ),origin = '1970-01-01' , tz = "UTC")
> tmp7 = as.POSIXct( as.numeric( tmp5 ),origin = '1970-01-01' , tz = "CET")
> 
> tmp5
> [1] "2000-01-30 CET"
>> tmp6
> [1] "2000-01-29 23:00:00 UTC"
>> tmp7
> [1] "2000-01-29 23:00:00 CET"
>> 
>> as.numeric(tmp5)
> [1] 949186800
>> as.numeric(tmp6)
> [1] 949186800
>> as.numeric(tmp7)
> [1] 949183200
> 
> 
> 
> On 22 Aug 2013, at 15:22, Daniel Haugstvedt <daniel.haugstvedt at gmail.com> wrote:
> 
>> From POSIXct to numeric and back with time zone 
>> 
>> I am running regressions on data which has time series with different time resolution. Some data has hourly resolution, while most has either daily or weekly resolution. Aggregation is used to make the hourly data daily, while liner interpolation is used to find daily data from the weekly time series. This data manipulation requires some careful handling of date and time.
>> 
>> I do travel across time zones and want my code to keep working as the system time zone changes.
>> 
>> So far quick fixes have been used to handle problems. Now I am trying to get a grip and make a more robust solution. Google and forums have left me with an increasing amount of questions instead of answers. 
>> 
>> I have chosen one question and one problem. The question, which should be trivial, should allow me to solve the problem. However, I have been stuck with this all day so if anyone know the solution to the problem straight away, it will be highly appreciated.
>> 
>> 
>> The question: What does the tz attribute in POSIXct do?
>> 
>> 
>> 
>> As an example, two dates with different time zone attributes, tmp1 and tmp2, are compared.
>> 
>> 
>>> tmp1 = as.POSIXct('2000-01-30',origin = '1970-01-01', tz = "UTC")
>> 
>>> tmp1
>> 
>> [1] "2000-01-30 UTC"
>> 
>> 
>>> tmp2 = as.POSIXct('2000-01-30',origin = '1970-01-01', tz = "ETC")
>> 
>>> tmp2
>> 
>> [1] "2000-01-30 UTC"
>> 
>> 
>> The time displayed, including the time zone, is the same but the tzone attributes are not.
>> 
>> 
>>> attributes(tmp1)
>> 
>> $class
>> 
>> [1] "POSIXct" "POSIXt"
>> 
>> 
>> $tzone
>> 
>> [1] "UTC"
>> 
>> 
>> 
>>> attributes(tmp2)
>> 
>> $class
>> 
>> [1] "POSIXct" "POSIXt"
>> 
>> 
>> $tzone
>> 
>> [1] "ETC"
>> 
>> 
>> As a final check the numbers are compared
>> 
>> 
>>> as.numeric(tmp1)
>> 
>> [1] 949190400
>> 
>>> as.numeric(tmp2)
>> 
>> [1] 949190400
>> 
>> 
>> and they match.
>> 
>> 
>> I was under the impression that POSIXct always used UTC and that the tzone attribute was only for displaying and converting to POSIXlt but that seems wrong in the above example. As far as I can see, the tzone attribute is neither used for display, as both dates display as UTC, and not used to change to origin, as both numbers are the same. My question is, what does the tzone attribute in POSIXct actually do?
>> 
>> 
>> I hope increased understanding of that part will let me solve the true problem without further assistance.
>> 
>> 
>> 
>> 
>> The problem: from POSIXct to numeric and back.
>> 
>> 
>>> tmp3 = as.POSIXct( '2000-01-30', origin = '1970-01-01' )
>> 
>> tmp3
>> 
>> [1] "2000-01-30 CET" 
>> 
>> 
>> Converting it to numeric and back to POSIXct it becomes
>> 
>>> as.POSIXct( as.numeric( tmp3 ),origin = '1970-01-01' )
>> 
>> [1] "2000-01-29 23:00:00 CET"
>> 
>> 
>> which is "2000-01-30 UTC". By converting to numeric and back to POSIXct, an hour has been added. This is not the behavior I want. I am trying to sett the tz attribute but it does not change the added hour.
>> 
>> 
>> Trying to understand more of what is going on and to replicate the original date, I set the time zone to be CET in both conversions.
>> 
>> 
>> as.POSIXct( as.numeric( as.POSIXct( '2000-01-30', origin = '1970-01-01', tz = "CET" ) ), origin = '1970-01-01', tz = "CET" )
>> 
>> [1] "2000-01-29 23:00:00 CET"
>> 
>> 
>> Which is "2000-01-30 UTC". Choosing set the time zone to be UTC in both conversions,
>> 
>> 
>> as.POSIXct( as.numeric( as.POSIXct( '2000-01-30', origin = '1970-01-01', tz = "UTC" ) ),
>> origin = '1970-01-01', tz = "UTC" )
>> 
>> [1] "2000-01-30 UTC",
>> 
>> 
>> I want to convert the date "2000-01-30 CET" to POSIXct and then over to numeric before finally converting back to POSIXct without changing the date, time or time zone. I seem to get  "2000-01-30 UTC" regardless of what I try so I am definitely missing something obvious.
>> 
>> 
>> Best Regards
>> 
>> 
>> Daniel Haugstvedt
>> 
>> Ph.d.-student, 
>> 
>> NTNU, Trondheim, Norway
>> 
>> 
>> PS. I am aware that my spelling is poor. Any comments on how it could be improved are appreciated but send it to me personally and not the list.  
> 
-- 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list