[R] Checking for invalid dates: Code works but needs improvement
Rui Barradas
ruipbarradas at sapo.pt
Fri Jan 27 05:18:04 CET 2012
Hello, again.
I now have a more complete answer to your points.
> 1. It's too long. My understanding is that skilled programmers can usually
> or often complete tasks like this in a few lines.
It's not very shorter but it's more readable. (The programmer is always
suspect)
> 2. It's not vectorized. I started out trying to do something that was
> vectorized
> but ran into problems with the strsplit function. I looked at the help
> file and
> it appears this function will only accept a single character vector.
All but one instructions are vectorized. And the one that is not only loops
for
a few column names.
Use 'unlist' on the 'strsplit' function's output to give a vector.
> 4. There's no way to specify names for input and output data. I imagine
> this would
> be fairly easy to specify this in the arguments to a function but am not
> sure how to
> incorporate it into a for loop.
You can now specify any matrix or data.frame, but it will only process the
columns with
dates. (This is not true, it will process anything with a '/' on it. Pay
attention.)
Near the beginning of your code include the following:
> TestDates <- data.frame(scan(connection,
> list(Patient=0, birthDT="", diagnosisDT="",
> metastaticDT="")))
>
> close(connection)
TDSaved <- TestDates # to avoid reopenning the connection
And then, after all of it,
fun <- function(Dat){
f <- function(jj, DF){
x <- as.character(DF[, jj])
x <- unlist(strsplit(x, "/"))
n <- length(x)
M <- x[seq(1, n, 3)]
D <- x[seq(2, n, 3)]
Y <- x[seq(3, n, 3)]
D[D == "un"] <- "15"
Y <- ifelse(nchar(Y) > 4 | Y < 1900, NA, Y)
x <- as.Date(paste(Y, M, D, sep="-"), format="%Y-%m-%d")
if(any(is.na(x)))
cat("Warning: Invalid date values in", jj, "\n",
as.character(DF[is.na(x), jj]), "\n")
x
}
colinx <- colnames(as.data.frame(Dat))
Dat <- data.frame(sapply(colinx, function(j) f(j, Dat)))
for(i in colinx) class(Dat[[i]]) <- "Date"
Dat
}
TD <- TDSaved
TD[, DateNames] <- fun(TD[, DateNames])
TD
Had fun in writing it.
Good luck.
Rui Barradas
--
View this message in context: http://r.789695.n4.nabble.com/Checking-for-invalid-dates-Code-works-but-needs-improvement-tp4324356p4332529.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list