[R] Error as.Date on Invalid Dates
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Sat Jan 24 09:10:16 CET 2009
Marie Sivertsen wrote:
> I am relatively new to R, so maybe I am miss something, but I now
> tried the as.Date now and have problems understanding how it works (or
> don't work as it seem).
>
>
> Brian D Ripley wrote:
>> On Thu, 22 Jan 2009, Terry Therneau wrote:
>>> One idea is to use the as.date function, for the older (and less capable) 'date'
>>> class. This is currently loaded by default with library(survival). It returns
>>> NA for an invalid date rather than dying.
>> So does as.Date **if you specify the format** (as you have to with your as.date:
>> it has a default one):
>
>
> as.Date("2001/1/1")
> Works fine
>
> as.Date("1/1/2001")
> Prints "1-01-20" ???
>
> as.Date("13/1/2001")
> Prints "13-01-20" ???
>
> as.Date("1/13/2001")
> Prints error: not in standard unambigous format
>
> It seems that as if both "1/1/2001" and "13/1/2001" were considered by
> R to be in a
> standard unambiguous format (or otherwise an error be reported?) and yet they
> are parsed incorrectly according to what one could think is obvious.
> It is also
> surprizing that not only "13/1/2001" but also "1/2/2001" and "2/1/2001" are
> successful but incorrect parsed as if they are unambiguous, and yet
> "13/1/2001" is ambiguous, though there is really just one way to
> parse it meaningfully.
>
> I think the strings that are incorrectly parsed should raise errors,
> and the last example should be succesful parsed. What is the reason
> for the observed?
There are two issues:
a) as.Date ignores trailing characters. This is what causes it to read
trailing 4 digit years as "20" or "19". I.e., 2/1/2001 makes sense as
2/1/20 (January 20, year 2 AD) followed by "01". This is a documented
feature, although the usefulness may not be clear to you. I suspect that
the point is that you sometimes get odd date strings, say
"Jan 24, 2009 - pd"
and you don't want to have to add code to strip off the unneeded part.
b) The error message could be better. Formats are never ambiguous and
none of them are defined uniquely by their strings ("03-02-01" is always
a problem) and we're not trying to auto-detect anyway. What it really
means is that the string is clearly not "%Y-%m-%d" or "%Y/%m/%d".
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list