[R] Checking for invalid dates: Code works but needs improvement

Rui Barradas ruipbarradas at sapo.pt
Fri Jan 27 05:18:04 CET 2012


Hello, again.

I now have a more complete answer to your points.

> 1. It's too long. My understanding is that skilled programmers can usually
> or often complete tasks like this in a few lines.

It's not very shorter but it's more readable. (The programmer is always
suspect)

> 2. It's not vectorized. I started out trying to do something that was
> vectorized
> but ran into problems with the strsplit function. I looked at the help
> file and
> it appears this function will only accept a single character vector. 

All but one instructions are vectorized. And the one that is not only loops
for
a few column names.
Use 'unlist' on the 'strsplit' function's output to give a vector.

> 4. There's no way to specify names for input and output data. I imagine
> this would
> be fairly easy to specify this in the arguments to a function but am not
> sure how to
> incorporate it into a for loop.

You can now specify any matrix or data.frame, but it will only process the
columns with
dates. (This is not true, it will process anything with a '/' on it. Pay
attention.)

Near the beginning of your code include the following:


> TestDates <- data.frame(scan(connection,
>                 list(Patient=0, birthDT="", diagnosisDT="",
> metastaticDT="")))
>
> close(connection)

TDSaved <- TestDates    # to avoid reopenning the connection

And then, after all of it,

fun <- function(Dat){
    f <- function(jj, DF){
        x <- as.character(DF[, jj])
        x <- unlist(strsplit(x, "/"))
        n <- length(x)
        M <- x[seq(1, n, 3)]
        D <- x[seq(2, n, 3)]
        Y <- x[seq(3, n, 3)]
        D[D == "un"] <- "15"
        Y <- ifelse(nchar(Y) > 4 | Y < 1900, NA, Y)
        x <- as.Date(paste(Y, M, D, sep="-"), format="%Y-%m-%d")
        if(any(is.na(x)))
            cat("Warning: Invalid date values in", jj, "\n",
                as.character(DF[is.na(x), jj]), "\n")
        x
    }
    colinx <- colnames(as.data.frame(Dat))
    Dat <- data.frame(sapply(colinx, function(j) f(j, Dat)))
    for(i in colinx) class(Dat[[i]]) <- "Date"
    Dat
}

TD <- TDSaved

TD[, DateNames] <- fun(TD[, DateNames])

TD

Had fun in writing it.
Good luck.

Rui Barradas



--
View this message in context: http://r.789695.n4.nabble.com/Checking-for-invalid-dates-Code-works-but-needs-improvement-tp4324356p4332529.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list