[R] Checking for invalid dates: Code works but needs improvement
David Winsemius
dwinsemius at comcast.net
Mon Jan 30 19:15:48 CET 2012
On Jan 30, 2012, at 8:44 AM, Paul Miller wrote:
> Hi Rui, Marc, and Gabor,
>
> Thanks for your replies to my question. All were helpful and it was
> interesting to see how different people approach various aspects of
> the same problem.
>
> Spent some time this weekend looking at Rui's solution, which is
> certainly much clearer than my own. Managed to figure out pretty
> much all the details of how it works. Also managed to tweak it
> slightly in order to make it do exactly what I wanted. (See revised
> code below.)
>
> Still have a couple of questions though. The first concerns the
> insertion of the code "Y > 2012" to set year values beyond 2012 to
> NA (on line 10 of the function below). When I add this (or use it
> in place of "nchar(Y) > 4"), the code succesfully finds the problem
> date "05/16/2015". After that though, it produces the following
> error message:
>
> Error in if (any(is.na(x) & M != "un" & Y != "un")) cat("Warning:
> Invalid date values in", : missing value where TRUE/FALSE needed
It's a bit dangerous to use comparison operators on mixed data types.
In your case you are comparing a character value to a numeric value
and may not realize that 2015 is not the same as "2015". Try "123" >
1000 if you want a quick counter-example. You may want to coerce the Y
value to "numeric" mode to be safe.
Also 'any' does not expect the logical connectives. You probably want:
any(is.na(x) , M != "un" , Y != "un")
>
> Why is this happening? If the code correctly correctly handles the
> date "06/20/1840" without producing an error, why can't it do
> likelwise with "05/16/2015"?
>
> The second question is why it's necessary to put "x" on line 15
> following "cat("Warning ...)". I know that I don't get any date
> columns if I don't include this but am not sure why.
>
> The third question is whether it's possible to change the class of
> the date variables without using a for loop. I played around with
> this a little but didn't find a vectorized alternative. It may be
> that this is not really important. It's just that I've read in
> several places that for loops should be avoided wherever possible.
>
> Thanks,
>
> Paul
>
>
> ##########################################
> #### Code for detecting invalid dates ####
> ##########################################
>
> #### Test Data ####
>
> connection <- textConnection("
> 1 11/23/21931 05/23/2009 un/17/2011
> 2 06/20/1840 02/30/2010 03/17/2011
> 3 06/17/1935 12/20/2008 07/un/2011
> 4 05/31/1937 01/18/2007 04/30/2011
> 5 06/31/1933 05/16/2015 11/20/un
> ")
>
> TestDates <- data.frame(scan(connection,
> list(Patient=0, birthDT="", diagnosisDT="", metastaticDT="")))
>
> close(connection)
>
> #### Input Data ####
>
> TDSaved <- TestDates
>
> #### List of Date Variables ####
>
> DateNames <- c("birthDT", "diagnosisDT", "metastaticDT")
>
> #### Date Function ####
>
> fun <- function(Dat){
> f <- function(jj, DF){
> x <- as.character(DF[, jj])
> x <- unlist(strsplit(x, "/"))
> n <- length(x)
> M <- x[seq(1, n, 3)]
> D <- x[seq(2, n, 3)]
> Y <- x[seq(3, n, 3)]
> D[D == "un"] <- "15"
> Y <- ifelse(nchar(Y) > 4 | Y > 2012 | Y < 1900, NA, Y)
> x <- as.Date(paste(Y, M, D, sep="-"), format="%Y-%m-%d")
> if(any(is.na(x) & M != "un" & Y != "un"))
> cat("Warning: Invalid date values in", jj, "\n",
> as.character(DF[is.na(x), jj]), "\n")
> x
> }
> Dat <- data.frame(sapply(names(Dat), function(j) f(j, Dat)))
> for(i in names(Dat)) class(Dat[[i]]) <- "Date"
> Dat
> }
>
> #### Output Data ####
>
> TD <- TDSaved
>
> #### Read Dates ####
>
> TD[, DateNames] <- fun(TD[, DateNames])
> TD
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list