[Rd] Invalid date-times and as.POSIXct problems (remotely related to DST issues)

Karl Ove Hufthammer karl at huftis.org
Mon Mar 12 15:29:14 CET 2012


I think this should be handled as a bug, but I’m not sure which
platforms and versions it applies to, so I’m writing to this list. The
problem is that as.POSIXct on character strings behaves in a strange way
if one of the date-times are invalid; it converts all the date-times to
dates (i.e., it discards the time part).

Example, which I suspect only works on my locale, with the UTC+1/UTC+2
timezone:

  $ dates=c("2003-10-13 00:15:00", "2008-06-03 14:45:00", "2003-03-30 02:00:00")

Note that the last date-time doesn’t actually exist 
(due to daylight saving time):
http://www.timeanddate.com/worldclock/meetingtime.html?day=30&month=3&year=2003&p1=187&iv=0

  $ d12=as.POSIXct(dates)
  $ d123=as.POSIXct(dates[1:2])
  $ d12
  [1] "2003-10-13 CEST" "2008-06-03 CEST" "2003-03-30 CET"
  $ d123
  [1] "2003-10-13 00:15:00 CEST" "2008-06-03 14:45:00 CEST"

When I include all values, they are all converted to (POSIXct) *dates*,
but if I exclude the invalid one, the rest are properly converted to
(POSIXct) date-times. Note that this is not just a display issue:

 $ unclass(d12)
 [1] 1065996000 1212444000 1048978800
 attr(,"tzone")
 [1] ""
 $ unclass(d123)
 [1] 1065996900 1212497100
 attr(,"tzone")
 [1] ""

I can only reproduce this on Windows; on Linux all the strings are
converted to date-times (the last one to 2003-03-30 01:00:00 CET).
However, if ones specifies a completely invalid time, e.g., 25:00, the
same thing does happen on Linux (2.14.2 Patched). I think the right/best
behaviour would be to convert the invalid date-time string to NA and
convert the other ones proper POSIXct date-times, and perhaps issue a
warning about NAs being generated.

(I originally discovered this problem on data from an Oracle database,
using sqlQuery() from the RODBC package, which automatically converts
date-times to date-times in current timezone (except if you specify
as.is=TRUE), and was surprised that for some queries the date-times were
truncated to dates. A warning that parts of the data were invalid would
be very welcome.)


Version details (for Windows):

$ version
                _
platform       i386-pc-mingw32
arch           i386
os             mingw32
system         i386, mingw32
status
major          2
minor          14.2
year           2012
month          02
day            29
svn rev        58522
language       R
version.string R version 2.14.2 (2012-02-29)

$ sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252 
LC_CTYPE=Norwegian-Nynorsk_Norway.1252   
LC_MONETARY=Norwegian-Nynorsk_Norway.1252
[4] LC_NUMERIC=C                             
LC_TIME=Norwegian-Nynorsk_Norway.1252

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

-- 
Karl Ove Hufthammer



More information about the R-devel mailing list