[R] strange date problem - May 3, 1992 is NA
Alexander Shenkin
ashenkin at ufl.edu
Wed Jun 22 23:28:08 CEST 2011
On 6/22/2011 4:09 PM, Brian Diggs wrote:
> On 6/22/2011 1:37 PM, Alexander Shenkin wrote:
>> On 6/22/2011 3:34 PM, Brian Diggs wrote:
>>> On 6/22/2011 12:09 PM, Luke Miller wrote:
>>>> For what it's worth, I cannot reproduce this problem under a nearly
>>>> identical instance of R (R 2.12.1, Win 7 Pro 64-bit). I also can't
>>>> reproduce the problem with R 2.13.0. You've got something truly weird
>>>> going on with your particular instance of R.
>>>>
>>>>
>>>>> is.na(strptime("5/3/1992", format="%m/%d/%Y"))
>>>> [1] FALSE
>>>>> is.na(strptime("5/2/1992", format="%m/%d/%Y"))
>>>> [1] FALSE
>>>>> sessionInfo()
>>>> R version 2.12.1 (2010-12-16)
>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252
>>>> [2] LC_CTYPE=English_United States.1252
>>>> [3] LC_MONETARY=English_United States.1252
>>>> [4] LC_NUMERIC=C
>>>> [5] LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] rj_0.5.2-1 lattice_0.19-17
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] grid_2.12.1 rJava_0.8-8 tools_2.12.1
>>>
>>> Like Luke, I can not reproduce what you see in (an old installation of)
>>> R 2.12.1 (and it also didn't have rj, lattice, grid, rJava, or tools
>>> attached or loaded in any way).
>>>
>>> My vague gut feeling is it might be a timezone/daylight savings time
>>> related issue (though usually times have to be involved). At least,
>>> that is a common problem with weird things happening with dates.
>>>
>>> What do you get as output for the following?
>>>
>>> Sys.timezone()
>>> Sys.info()
>>> conflicts()
>>> dput(strptime("5/3/1992", format="%m/%d/%Y"))
>>> dput(as.POSIXct(strptime("5/3/1992", format="%m/%d/%Y")))
>>> dput(strptime("5/2/1992", format="%m/%d/%Y"))
>>> dput(as.POSIXct(strptime("5/2/1992", format="%m/%d/%Y")))
>>
>>> Sys.timezone()
>> [1] "COT"
>>> Sys.info()
>> sysname release
>> version nodename machine
>> "Windows" "7 x64" "build 7601,
>> Service Pack 1" "machine_name" "x86"
>> login user
>> "username" "username"
>>> conflicts()
>> [1] "untangle.specials" "body<-" "format.pval"
>> "round.POSIXt" "trunc.POSIXt" "units"
>>> dput(strptime("5/3/1992", format="%m/%d/%Y"))
>> structure(list(sec = 0, min = 0L, hour = 0L, mday = 3L, mon = 4L,
>> year = 92L, wday = 0L, yday = 123L, isdst = -1L), .Names = c("sec",
>> "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
>> ), class = c("POSIXlt", "POSIXt"))
>>> dput(as.POSIXct(strptime("5/3/1992", format="%m/%d/%Y")))
>> structure(NA_real_, class = c("POSIXct", "POSIXt"), tzone = "")
>>> dput(strptime("5/2/1992", format="%m/%d/%Y"))
>> structure(list(sec = 0, min = 0L, hour = 0L, mday = 2L, mon = 4L,
>> year = 92L, wday = 6L, yday = 122L, isdst = 0L), .Names = c("sec",
>> "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
>> ), class = c("POSIXlt", "POSIXt"))
>>> dput(as.POSIXct(strptime("5/2/1992", format="%m/%d/%Y")))
>> structure(704782800, class = c("POSIXct", "POSIXt"), tzone = "")
>>
>
> Fun :)
>
> So, not being familiar with COT, I looked it up to see what/when the
> daylight savings times switch overs are/were.
>
> http://www.timeanddate.com/worldclock/timezone.html?n=41&syear=1990
>
> Daylight savings time started (in 1992 only) on "Midnight between
> Saturday, May 2 and Sunday, May 3" and ended (in 1993) on "Midnight
> between Saturday, April 3 and Sunday, April 4". In particular, it went
> from Saturday, May 2, 1992 11:59:59 PM to Sunday, May 3 1992 1:00:00 AM.
> So there was no midnight on May 3. So when strptime converts the date,
> it, by default, sets the time to midnight. Except that is not valid
> according to the DST rules (which is why isdst gets set to -1). When
> converting to a POSIXct, it becomes NA.
>
> There are probably a lot of places in R that assume midnight is a valid
> time, and so I don't know what all will or will not work in that
> timezone (you probably will also have problems with seq and cut on
> POSIXct/POSIXlt's in that timezone at least). I'd recommend using a
> different timezone. Or, if you don't need times, using Date (which
> doesn't have timezones and so avoids this):
>
> as.Date("5/3/1992", format="%m/%d/%Y")
Thanks for your detective work, Brian! Nice one. I am now using
"date", and so _my_ problem is solved. However, it must be the case
that others have and will continue to run across this problem (and
perhaps won't even realize it, thus tossing away data). Indeed, it
seems like there are quite a number of places that have DST switching at
midnight:
http://www.google.com/search?q=Midnight+site%3Ahttp%3A%2F%2Fwww.timeanddate.com%2Fworldclock%2Ftimezone.html
. I assume all these timezones would come across a similar problem as mine?
What would be the best route to try to get this smoothed over in R-core?
>
>>>
>>>
>>>> On Wed, Jun 22, 2011 at 2:40 PM, Alexander Shenkin<ashenkin at ufl.edu>
>>>> wrote:
>>>>> On 6/22/2011 1:34 PM, Sarah Goslee wrote:
>>>>>> On Wed, Jun 22, 2011 at 2:28 PM, David
>>>>>> Winsemius<dwinsemius at comcast.net> wrote:
>>>>>>>
>>>>>>> On Jun 22, 2011, at 2:03 PM, Sarah Goslee wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On Wed, Jun 22, 2011 at 11:40 AM, Alexander
>>>>>>>> Shenkin<ashenkin at ufl.edu>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> is.na(strptime("5/2/1992", format="%m/%d/%Y"))
>>>>>>>>>
>>>>>>>>> [1] FALSE
>>>>>>>>>>
>>>>>>>>>> is.na(strptime("5/3/1992", format="%m/%d/%Y"))
>>>>>>>>>
>>>>>>>>> [1] TRUE
>>>>>>>>
>>>>>>>> I can't reproduce your problem on R 2.13.0 on linux:
>>>>>>>
>>>>>>> I also cannot reproduce it on a Mac with 2.13.0 beta
>>>>>>
>>>>>> Which strongly suggests that you should start by upgrading your R
>>>>>> installation if at all possible.
>>>>>>
>>>>>> I'd also recommend trying it on a default R session, with no extra
>>>>>> packages loaded, and no items in your workspace. It's possible that
>>>>>> something else is interfering.
>>>>>>
>>>>>> On linux, that's achieved by typing R --vanilla at the command line.
>>>>>> I'm afraid I don't know how to do it for Windows, but should be
>>>>>> similarly straightforward.
>>>>>>
>>>>> Thanks Sarah. Still getting the problem. I should surely upgrade,
>>>>> but
>>>>> still, not a bad idea to get to the bottom of this, or at least
>>>>> have it
>>>>> documented as a known issue. BTW, I'm on Windows 7 Pro x64.
>>>>>
>>>>> (running Rgui.exe --vanilla):
>>>>>
>>>>>> is.na(strptime("5/3/1992", format="%m/%d/%Y"))
>>>>> [1] TRUE
>>>>>
>>>>>> is.na(strptime("5/2/1992", format="%m/%d/%Y"))
>>>>> [1] FALSE
>>>>>
>>>>>> sessionInfo()
>>>>> R version 2.12.1 (2010-12-16)
>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=English_United States.1252
>>>>> [2] LC_CTYPE=English_United States.1252
>>>>> [3] LC_MONETARY=English_United States.1252
>>>>> [4] LC_NUMERIC=C
>>>>> [5] LC_TIME=English_United States.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
More information about the R-help
mailing list