[R] Checking dates for entry errors

Paul Miller pjmiller_57 at yahoo.com
Wed Jan 11 23:07:49 CET 2012


Hello Everyone,
 
I have a question about how best to check dates for entry errors. I recently discovered that R will read the incorrectly entered date "11/23/21931" without producing a warning or an error message at least under some circumstances. 
 
> as.Date("11/23/21931", format = "%m/%d/%Y")
[1] "2193-11-23"

> as.Date("21931-11-23")
Error in charToDate(x) : 
  character string is not in a standard unambiguous format

Similarly, under some circumstances, R will convert an impossible date like February 31, 2011 to NA rather than issuing a warning.

> as.Date("02/31/2011", format = "%m/%d/%Y")
[1] NA

> as.Date("2011-02-31")
Error in charToDate(x) : 
  character string is not in a standard unambiguous format
 
In the former case, one could easily lose the date rather than recognizing it is in error and needs to be corrected.
 
So my question is how best to check these sorts of date values.
 
So far, I've been checking date values with things like:
 
sort( unique(DOB) )
sort( unique(substr(DOB, 1, 4) ) )
sort( unique(substr(DOB, 6, 7) ) )
sort( unique(substr(DOB, 9, 10) ) )
 
These are good for seeing, say, year values that are clearly in error, but don't do much to catch the impossible date I cited above.
 
How can I use R to better scrutinize my date data? Is there any way to make it complain more when there is a problem with my date data?
 
Thanks,
 
Paul
 
 



More information about the R-help mailing list