[R] strange date problem - May 3, 1992 is NA

Brian Diggs diggsb at ohsu.edu
Wed Jun 22 23:09:37 CEST 2011


On 6/22/2011 1:37 PM, Alexander Shenkin wrote:
> On 6/22/2011 3:34 PM, Brian Diggs wrote:
>> On 6/22/2011 12:09 PM, Luke Miller wrote:
>>> For what it's worth, I cannot reproduce this problem under a nearly
>>> identical instance of R (R 2.12.1, Win 7 Pro 64-bit). I also can't
>>> reproduce the problem with R 2.13.0. You've got something truly weird
>>> going on with your particular instance of R.
>>>
>>>
>>>> is.na(strptime("5/3/1992", format="%m/%d/%Y"))
>>> [1] FALSE
>>>> is.na(strptime("5/2/1992", format="%m/%d/%Y"))
>>> [1] FALSE
>>>> sessionInfo()
>>> R version 2.12.1 (2010-12-16)
>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252
>>> [2] LC_CTYPE=English_United States.1252
>>> [3] LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C
>>> [5] LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] rj_0.5.2-1      lattice_0.19-17
>>>
>>> loaded via a namespace (and not attached):
>>> [1] grid_2.12.1  rJava_0.8-8  tools_2.12.1
>>
>> Like Luke, I can not reproduce what you see in (an old installation of)
>> R 2.12.1 (and it also didn't have rj, lattice, grid, rJava, or tools
>> attached or loaded in any way).
>>
>> My vague gut feeling is it might be a timezone/daylight savings time
>> related issue (though usually times have to be involved).  At least,
>> that is a common problem with weird things happening with dates.
>>
>> What do you get as output for the following?
>>
>> Sys.timezone()
>> Sys.info()
>> conflicts()
>> dput(strptime("5/3/1992", format="%m/%d/%Y"))
>> dput(as.POSIXct(strptime("5/3/1992", format="%m/%d/%Y")))
>> dput(strptime("5/2/1992", format="%m/%d/%Y"))
>> dput(as.POSIXct(strptime("5/2/1992", format="%m/%d/%Y")))
>
>> Sys.timezone()
> [1] "COT"
>> Sys.info()
>                       sysname                      release
>         version                     nodename                      machine
>                     "Windows"                      "7 x64" "build 7601,
> Service Pack 1"             "machine_name"                        "x86"
>                         login                         user
>                    "username"                   "username"
>> conflicts()
> [1] "untangle.specials" "body<-"            "format.pval"
> "round.POSIXt"      "trunc.POSIXt"      "units"
>> dput(strptime("5/3/1992", format="%m/%d/%Y"))
> structure(list(sec = 0, min = 0L, hour = 0L, mday = 3L, mon = 4L,
>      year = 92L, wday = 0L, yday = 123L, isdst = -1L), .Names = c("sec",
> "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
> ), class = c("POSIXlt", "POSIXt"))
>> dput(as.POSIXct(strptime("5/3/1992", format="%m/%d/%Y")))
> structure(NA_real_, class = c("POSIXct", "POSIXt"), tzone = "")
>> dput(strptime("5/2/1992", format="%m/%d/%Y"))
> structure(list(sec = 0, min = 0L, hour = 0L, mday = 2L, mon = 4L,
>      year = 92L, wday = 6L, yday = 122L, isdst = 0L), .Names = c("sec",
> "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
> ), class = c("POSIXlt", "POSIXt"))
>> dput(as.POSIXct(strptime("5/2/1992", format="%m/%d/%Y")))
> structure(704782800, class = c("POSIXct", "POSIXt"), tzone = "")
>

Fun :)

So, not being familiar with COT, I looked it up to see what/when the 
daylight savings times switch overs are/were.

http://www.timeanddate.com/worldclock/timezone.html?n=41&syear=1990

Daylight savings time started (in 1992 only) on "Midnight between 
Saturday, May 2 and Sunday, May 3" and ended (in 1993) on "Midnight 
between Saturday, April 3 and Sunday, April 4". In particular, it went 
from Saturday, May 2, 1992 11:59:59 PM to Sunday, May 3 1992 1:00:00 AM. 
  So there was no midnight on May 3.  So when strptime converts the 
date, it, by default, sets the time to midnight.  Except that is not 
valid according to the DST rules (which is why isdst gets set to -1). 
When converting to a POSIXct, it becomes NA.

There are probably a lot of places in R that assume midnight is a valid 
time, and so I don't know what all will or will not work in that 
timezone (you probably will also have problems with seq and cut on 
POSIXct/POSIXlt's in that timezone at least).  I'd recommend using a 
different timezone.  Or, if you don't need times, using Date (which 
doesn't have timezones and so avoids this):

as.Date("5/3/1992", format="%m/%d/%Y")

>>
>>
>>> On Wed, Jun 22, 2011 at 2:40 PM, Alexander Shenkin<ashenkin at ufl.edu>
>>> wrote:
>>>> On 6/22/2011 1:34 PM, Sarah Goslee wrote:
>>>>> On Wed, Jun 22, 2011 at 2:28 PM, David
>>>>> Winsemius<dwinsemius at comcast.net>   wrote:
>>>>>>
>>>>>> On Jun 22, 2011, at 2:03 PM, Sarah Goslee wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Wed, Jun 22, 2011 at 11:40 AM, Alexander Shenkin<ashenkin at ufl.edu>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> is.na(strptime("5/2/1992", format="%m/%d/%Y"))
>>>>>>>>
>>>>>>>> [1] FALSE
>>>>>>>>>
>>>>>>>>> is.na(strptime("5/3/1992", format="%m/%d/%Y"))
>>>>>>>>
>>>>>>>> [1] TRUE
>>>>>>>
>>>>>>> I can't reproduce your problem on R 2.13.0 on linux:
>>>>>>
>>>>>> I also cannot reproduce it on a Mac with 2.13.0 beta
>>>>>
>>>>> Which strongly suggests that you should start by upgrading your R
>>>>> installation if at all possible.
>>>>>
>>>>> I'd also recommend trying it on a default R session, with no extra
>>>>> packages loaded, and no items in your workspace. It's possible that
>>>>> something else is interfering.
>>>>>
>>>>> On linux, that's achieved by typing R --vanilla at the command line.
>>>>> I'm afraid I don't know how to do it for Windows, but should be
>>>>> similarly straightforward.
>>>>>
>>>> Thanks Sarah.  Still getting the problem.  I should surely upgrade, but
>>>> still, not a bad idea to get to the bottom of this, or at least have it
>>>> documented as a known issue.  BTW, I'm on Windows 7 Pro x64.
>>>>
>>>> (running Rgui.exe --vanilla):
>>>>
>>>>> is.na(strptime("5/3/1992", format="%m/%d/%Y"))
>>>> [1] TRUE
>>>>
>>>>> is.na(strptime("5/2/1992", format="%m/%d/%Y"))
>>>> [1] FALSE
>>>>
>>>>> sessionInfo()
>>>> R version 2.12.1 (2010-12-16)
>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252
>>>> [2] LC_CTYPE=English_United States.1252
>>>> [3] LC_MONETARY=English_United States.1252
>>>> [4] LC_NUMERIC=C
>>>> [5] LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>
>>
>


-- 
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University



More information about the R-help mailing list