strptime {base} | R Documentation |
Date-time Conversion Functions to and from Character
Description
Functions to convert between character representations and objects of
classes "POSIXlt"
and "POSIXct"
representing calendar
dates and times.
Usage
## S3 method for class 'POSIXct'
format(x, format = "", tz = "", usetz = FALSE, ...)
## S3 method for class 'POSIXlt'
format(x, format = "", usetz = FALSE,
digits = getOption("digits.secs"), ...)
## S3 method for class 'POSIXt'
as.character(x, digits = if(inherits(x, "POSIXlt")) 14L else 6L,
OutDec = ".", ...)
strftime(x, format = "", tz = "", usetz = FALSE, ...)
strptime(x, format, tz = "")
Arguments
x |
an object to be converted: a character vector for
|
tz |
a character string specifying the time zone to be used for
the conversion. System-specific (see |
format |
a character string. The default for the |
... |
further arguments to be passed from or to other methods. |
usetz |
logical. Should the time zone abbreviation be appended
to the output? This is used in printing times, and more reliable
than using |
digits |
integer determining the |
OutDec |
a 1-character string specifying the decimal point to be
used; the default is not |
Details
The format
and as.character
methods and strftime
convert objects from the classes "POSIXlt"
and
"POSIXct"
to character
vectors.
strptime
converts character vectors to class "POSIXlt"
:
its input x
is first converted by as.character
.
Each input string is processed as far as necessary for the format
specified: any trailing characters are ignored.
strftime
is a wrapper for format.POSIXlt
, and it and
format.POSIXct
first convert to class "POSIXlt"
by
calling as.POSIXlt
(so they also work for class
"Date"
). Note that only that conversion depends on the
time zone. Since R version 4.2.0, as.POSIXlt()
conversion now
treats the non-finite numeric -Inf
, Inf
, NA
and
NaN
differently (where previously all were treated as
NA
). Also the format()
method for POSIXlt
now
treats these different non-finite times and dates analogously to type
double
.
The usual vector re-cycling rules are applied to x
and
format
so the answer will be of length of the longer of these
vectors.
Locale-specific conversions to and from character strings are used
where appropriate and available. This affects the names of the days
and months, the AM/PM indicator (if used) and the separators in output
formats such as %x
and %X
, via the setting of
the LC_TIME
locale category. The ‘current
locale’ of the descriptions might mean the locale in use at the start
of the R session or when these functions are first used. (For input,
the locale-specific conversions can be changed by calling
Sys.setlocale
with category LC_TIME
(or
LC_ALL
). For output, what happens depends on the OS but
usually works.)
The details of the formats are platform-specific, but the following are
likely to be widely available: most are defined by the POSIX standard.
A conversion specification is introduced by %
, usually
followed by a single letter or O
or E
and then a single
letter. Any character in the format string not part of a conversion
specification is interpreted literally (and %%
gives
%
). Widely implemented conversion specifications include
%a
Abbreviated weekday name in the current locale on this platform. (Also matches full name on input: in some locales there are no abbreviations of names.)
%A
Full weekday name in the current locale. (Also matches abbreviated name on input.)
%b
Abbreviated month name in the current locale on this platform. (Also matches full name on input: in some locales there are no abbreviations of names.)
%B
Full month name in the current locale. (Also matches abbreviated name on input.)
%c
Date and time. Locale-specific on output,
"%a %b %e %H:%M:%S %Y"
on input.%C
Century (00–99): the integer part of the year divided by 100.
%d
Day of the month as decimal number (01–31).
%D
Date format such as
%m/%d/%y
: the C99 standard says it should be that exact format (but not all OSes comply).%e
Day of the month as decimal number (1–31), with a leading space for a single-digit number.
%F
Equivalent to %Y-%m-%d (the ISO 8601 date format).
%g
The last two digits of the week-based year (see
%V
). (Accepted but ignored on input.)%G
The week-based year (see
%V
) as a decimal number. (Accepted but ignored on input.)%h
Equivalent to
%b
.%H
Hours as decimal number (00–23). As a special exception strings such as ‘24:00:00’ are accepted for input, since ISO 8601 allows these.
%I
Hours as decimal number (01–12).
%j
Day of year as decimal number (001–366): For input, 366 is only valid in a leap year.
%m
Month as decimal number (01–12).
%M
Minute as decimal number (00–59).
%n
Newline on output, arbitrary whitespace on input.
%p
AM/PM indicator in the locale. Used in conjunction with
%I
and not with%H
. An empty string in some locales (for example on some OSes, non-English European locales including Russia). The behaviour is undefined if used for input in such a locale.Some platforms accept
%P
for output, which uses a lower-case version (%p
may also use lower case): others will outputP
.%r
For output, the 12-hour clock time (using the locale's AM or PM): only defined in some locales, and on some OSes misleading in locales which do not define an AM/PM indicator. For input, equivalent to
%I:%M:%S %p
.%R
Equivalent to
%H:%M
.%S
Second as integer (00–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds).
%t
Tab on output, arbitrary whitespace on input.
%T
Equivalent to
%H:%M:%S
.%u
Weekday as a decimal number (1–7, Monday is 1).
%U
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
%V
Week of the year as decimal number (01–53) as defined in ISO 8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1. See
%G
(%g
) for the year corresponding to the week given by%V
. (Accepted but ignored on input.)%w
Weekday as decimal number (0–6, Sunday is 0).
%W
Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention.
%x
Date. Locale-specific on output,
"%y/%m/%d"
on input.%X
Time. Locale-specific on output,
"%H:%M:%S"
on input.%y
Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2018 POSIX standard, but it does also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’.
%Y
Year with century. Note that whereas there was no zero in the original Gregorian calendar, ISO 8601:2004 defines it to be valid (interpreted as 1BC): see https://en.wikipedia.org/wiki/0_(year). However, the standards also say that years before 1582 in its calendar should only be used with agreement of the parties involved.
For input, only years
0:9999
are accepted.%z
Signed offset in hours and minutes from UTC, so
-0800
is 8 hours behind UTC. (Standard only for output. For input R currently supports it on all platforms – values from-1400
to+1400
are accepted.)%Z
(Output only.) Time zone abbreviation as a character string (empty if not available). This may not be reliable when a time zone has changed abbreviations over the years.
Where leading zeros are shown they will be used on output but are
optional on input. Names are matched case-insensitively on input:
whether they are capitalized on output depends on the platform and the
locale. Note that abbreviated names are platform-specific (although
the standards specify that in the ‘C’ locale they must be the
first three letters of the capitalized English name: this convention
is widely used in English-language locales but for example the French
month abbreviations are not the same on any two of Linux, macOS, Solaris
and Windows). Knowing what the abbreviations are is essential
if you wish to use %a
, %b
or %h
as part of an
input format: see the examples for how to check.
When %z
or %Z
is used for output with an
object with an assigned time zone an attempt is made to use the values
for that time zone — but it is not guaranteed to succeed.
The definition of ‘whitespace’ for %n
and %t
is platform-dependent: for most it does not include non-breaking spaces.
Not in the standards and less widely implemented are
%k
The 24-hour clock time with single digits preceded by a blank.
%l
The 12-hour clock time with single digits preceded by a blank.
%s
(Output only.) The number of seconds since the epoch.
%+
(Output only.) Similar to
%c
, often"%a %b %e %H:%M:%S %Z %Y"
. May depend on the locale.
For output there are also %O[dHImMUVwWy]
which may emit
numbers in an alternative locale-dependent format (e.g., roman
numerals), and %E[cCyYxX]
which can use an alternative
‘era’ (e.g., a different religious calendar). Which of these
are supported is OS-dependent. These are accepted for input, but with
the standard interpretation.
Specific to R is %OSn
, which for output gives the seconds
truncated to 0 <= n <= 6
decimal places (and if %OS
is
not followed by a digit, it uses the setting of
getOption("digits.secs")
, or if that is unset, n =
0
). Further, for strptime
%OS
will input seconds
including fractional seconds. Note that %S
does not read
fractional parts on output.
The behaviour of other conversion specifications (and even if other
character sequences commencing with %
are conversion
specifications) is system-specific. Some systems document that the
use of multi-byte characters in format
is unsupported: UTF-8
locales are unlikely to cause a problem.
Value
The format
methods and strftime
return character vectors
representing the time. NA
times are returned as
NA_character_
.
strptime
turns character representations into an object of
class "POSIXlt"
. The time zone is used to set the
isdst
component and to set the "tzone"
attribute if
tz != ""
. If the specified time is invalid (for example
‘"2010-02-30 08:00"’) all the components of the result are
NA
. (NB: this does means exactly what it says – if it is an
invalid time, not just a time that does not exist in some time zone.)
Printing years
Everyone agrees that years from 1000 to 9999 should be printed with 4 digits, but the standards do not define what is to be done outside that range. For years 0 to 999 most OSes pad with zeros or spaces to 4 characters, but Linux/glibc outputs just the number.
OS facilities will probably not print years before 1 CE (aka 1 AD)
‘correctly’ (they tend to assume the existence of a year 0: see
https://en.wikipedia.org/wiki/0_(year), and some OSes get them
completely wrong). Common formats are -45
and -045
.
Years after 9999 and before -999 are normally printed with five or more characters.
Some platforms support modifiers from POSIX 2008 (and others). On
Linux/glibc the format "%04Y"
assures a minimum of four
characters and zero-padding (the default is no padding). The internal
code (as used on Windows and by default on macOS) uses zero-padding by
default (this can be controlled by environment variable
R_PAD_YEARS_BY_ZERO). On those platforms, formats %04Y
,
%_4Y
and %_Y
can be used for zero, space and no
padding respectively. (On macOS, the native code (not the default)
supports none of these and uses zero-padding to 4 digits.)
Time zone offsets
Offsets from GMT (also known as UTC) are part of the conversion
between timezones and to/from class "POSIXct"
, but cause
difficulties as they are often computed incorrectly.
They conventionally have the opposite sign from time-zone
specifications (see Sys.timezone
): positive values are
East of the meridian. Although there have been time zones with
offsets like +00:09:21 (Paris in 1900), and -00:44:30 (Liberia until
1972), offsets are usually treated as whole numbers of minutes, and
are most often seen in RFC 5322 email headers in forms like
-0800
(e.g., used on the Pacific coast of the USA in winter).
Format %z
can be used for input or output: it is a character
string, conventionally plus or minus followed by two digits for hours
and two for minutes: the standards say that an empty string should be
output if the offset is undetermined, but some systems use
+0000
or the offsets for the time zone in use for the current
year. (On some platforms this works better after conversion to
"POSIXct"
. Some platforms only recognize hour or half-hour
offsets for output.)
Using %z
for input makes most sense with tz = "UTC"
.
Sources
Input uses the POSIX function strptime
and output the C99
function strftime
.
However, not all OSes (notably Windows) provided strptime
and
many issues were found for those which did, so since 2000 R has used
a fork of code from ‘glibc’. The forked code uses the
system's strftime
to find the locale-specific day and month
names and any AM/PM indicator.
On some platforms (including Windows and by default on macOS) the
system's strftime
is replaced (along with most of the rest of
the C-level datetime code) by code modified from IANA's ‘tzcode’
distribution (https://www.iana.org/time-zones).
Note that as strftime
is used for output (and not
wcsftime
), argument format
is translated if necessary to
the session encoding.
Note
The default formats follow the rules of the ISO 8601 international
standard which expresses a day as "2001-02-28"
and a time as
"14:01:02"
using leading zeroes as here. (The ISO form uses no
space, possibly ‘T’, to separate dates and times: R uses a space
by default.)
For strptime
the input string need not specify the date
completely: it is assumed that unspecified seconds, minutes or hours
are zero, and an unspecified year, month or day is the current one.
(However, if a month is specified, the day of that month has to be
specified by %d
or %e
since the current day of the
month need not be valid for the specified month.) Some components may
be returned as NA
(but an unknown tzone
component is
represented by an empty string).
If the time zone specified is invalid on your system, what happens is system-specific but it will probably be ignored.
Remember that in most time zones some times do not occur and some
occur twice because of transitions to/from ‘daylight saving’
(also known as ‘summer’) time. strptime
does not
validate such times (it does not assume a specific time zone), but
conversion by as.POSIXct
will do so. Conversion by
strftime
and formatting/printing uses OS facilities and may
return nonsensical results for non-existent times at DST transitions.
In a C locale %c
is required to be
"%a %b %e %H:%M:%S %Y"
. As Windows does not comply (and
uses a date format not understood outside N. America), that format is
used by R on Windows in all locales.
There is a limit of 2048 bytes on each string produced by
strftime
and the format
methods. As from R 4.3.0
attempting to exceed this is an error (previous versions silently
truncated at 255 bytes).
References
International Organization for Standardization (2004, 2000, ...) ‘ISO 8601. Data elements and interchange formats – Information interchange – Representation of dates and times.’, slightly updated to International Organization for Standardization (2019) ‘ISO 8601-1:2019. Date and time – Representations for information interchange – Part 1: Basic rules’, and further amended in 2022. For links to versions available on-line see (at the time of writing) https://dotat.at/tmp/ISO_8601-2004_E.pdf and https://www.qsl.net/g1smd/isopdf.htm; for information on the current official version, see https://www.iso.org/iso/iso8601 and https://en.wikipedia.org/wiki/ISO_8601.
The POSIX 1003.1 standard, which is in some respects stricter than ISO 8601.
See Also
DateTimeClasses for details of the date-time classes; locales to query or set a locale.
Your system's help page on strftime
to see how to specify their
formats. (On some systems, including Windows, strftime
is
replaced by more comprehensive internal code.)
Examples
## locale-specific version of date()
format(Sys.time(), "%a %b %d %X %Y %Z")
## time to sub-second accuracy (if supported by the OS)
format(Sys.time(), "%H:%M:%OS3")
## read in date info in format 'ddmmmyyyy'
## This will give NA(s) in some non-English locales; setting the C locale
## as in the commented lines will overcome this on most systems.
## lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C")
x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
z <- strptime(x, "%d%b%Y")
## Sys.setlocale("LC_TIME", lct)
z
(chz <- as.character(z)) # same w/o TZ
## *here* (but not in general), the same as format():
stopifnot(exprs = {
identical(chz, format(z))
grepl("^1960-0[137]-[03][012]$", chz[!is.na(z)])
})
## read in date/time info in format 'm/d/y h:m:s'
dates <- c("02/27/92", "02/27/92", "01/14/92", "02/28/92", "02/01/92")
times <- c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26")
x <- paste(dates, times)
z2 <- strptime(x, "%m/%d/%y %H:%M:%S")
z2
## *here* (but not in general), the same as format():
stopifnot(identical(format(z2), as.character(z2)))
## time with fractional seconds
z3 <- strptime("20/2/06 11:16:16.683", "%d/%m/%y %H:%M:%OS")
z3 # prints without fractional seconds by default, digits.sec = NULL ("= 0")
op <- options(digits.secs = 3)
z3 # shows the 3 extra digits
as.character(z3) # ditto
options(op)
## time zone names are not portable, but 'EST5EDT' comes pretty close.
## (but its interpretation may not be universal: see ?timezones)
z4 <- strptime(c("2006-01-08 10:07:52", "2006-08-07 19:33:02"),
"%Y-%m-%d %H:%M:%S", tz = "EST5EDT")
z4
attr(z4, "tzone")
as.character(z4)
z4$sec[2] <- pi # "very" fractional seconds
as.character(z4) # shows full precision
format(z4) # no fractional sec
format(z4, digits=8) # shows only 6 (hard-wired maximum)
format(z4, digits=4)
## An RFC 5322 header (Eastern Canada, during DST)
## In a non-English locale the commented lines may be needed.
## prev <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C")
strptime("Tue, 23 Mar 2010 14:36:38 -0400", "%a, %d %b %Y %H:%M:%S %z")
## Sys.setlocale("LC_TIME", prev)
## Make sure you know what the abbreviated names are for you if you wish
## to use them for input (they are matched case-insensitively):
format(s1 <- seq.Date(as.Date('1978-01-01'), by = 'day', len = 7), "%a")
format(s2 <- seq.Date(as.Date('2000-01-01'), by = 'month', len = 12), "%b")
## Non-finite date-times :
format(as.POSIXct(Inf)) # "Inf" (was NA in R <= 4.1.x)
format(as.POSIXlt(c(-Inf,Inf,NaN,NA))) # were all NA