[R] Error as.Date on Invalid Dates

Peter Dalgaard p.dalgaard at biostat.ku.dk
Sat Jan 24 09:10:16 CET 2009


Marie Sivertsen wrote:
> I am relatively new to R, so maybe I am miss something, but I now
> tried the as.Date now and have problems understanding how it works (or
> don't work as it seem).
> 
> 
> Brian D Ripley wrote:
>> On Thu, 22 Jan 2009, Terry Therneau wrote:
>>> One idea is to use the as.date function, for the older (and less capable) 'date'
>>> class.  This is currently loaded by default with library(survival).  It returns
>>> NA for an invalid date rather than dying.
>> So does as.Date **if you specify the format** (as you have to with your as.date:
>> it has a default one):
> 
> 
> as.Date("2001/1/1")
> Works fine
> 
> as.Date("1/1/2001")
> Prints "1-01-20" ???
> 
> as.Date("13/1/2001")
> Prints "13-01-20" ???
> 
> as.Date("1/13/2001")
> Prints error: not in standard unambigous format
> 
> It seems that as if both "1/1/2001" and "13/1/2001" were considered by
> R to be in a
> standard unambiguous format (or otherwise an error be reported?) and yet they
> are parsed incorrectly according to what one could think is obvious.
> It is also
> surprizing that not only "13/1/2001" but also "1/2/2001" and "2/1/2001" are
> successful but incorrect parsed as if they are unambiguous, and yet
> "13/1/2001" is ambiguous, though there is really just one way to
> parse it meaningfully.
> 
> I think the strings that are incorrectly parsed should raise errors,
> and the last example should be succesful parsed.  What is the reason
> for the observed?


There are two issues:

a) as.Date ignores trailing characters. This is what causes it to read 
trailing 4 digit years as "20" or "19". I.e., 2/1/2001 makes sense as 
2/1/20 (January 20, year 2 AD) followed by "01". This is a documented 
feature, although the usefulness may not be clear to you. I suspect that 
the point is that you sometimes get odd date strings, say
"Jan 24, 2009 - pd"
and you don't want to have to add code to strip off the unneeded part.

b) The error message could be better. Formats are never ambiguous and 
none of them are defined uniquely by their strings ("03-02-01" is always 
a problem) and we're not trying to auto-detect anyway. What it really 
means is that the string is clearly not "%Y-%m-%d" or "%Y/%m/%d".

-- 
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907




More information about the R-help mailing list