[R] Undesired result

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Wed Feb 17 19:06:01 CET 2021


Just a quick note:  you can simplify my function and speed it up quite a 
bit if speed is an issue.  I had forgotten that the POSIXlt type could 
act like a vector; using that you don't need those inner for loops, and 
with a little calculation you can also do without the outer while loops.

Duncan Murdoch

On 17/02/2021 12:50 p.m., Duncan Murdoch wrote:
> On 17/02/2021 9:50 a.m., Val wrote:
>> HI All,
>>
>> I am reading a data file which has different date formats. I wanted to
>> standardize to one format and used  a library anytime but got
>> undesired results as shown below. It gave me year 2093 instead of 1993
>>
>>
>> library(anytime)
>> DFX<-read.table(text="name ddate
>>     A  19-10-02
>>     D  11/19/2006
>>     F  9/9/2011
>>     G1  12/29/2010
>>     AA   10/18/93 ",header=TRUE)
>>       getFormats()
>>       addFormats(c("%d-%m-%y"))
>>       addFormats(c("%m-%d-%y"))
>>       addFormats(c("%Y/%d/%m"))
>>       addFormats(c("%m/%d/%y"))
>>
>> DFX$anew=anydate(DFX$ddate)
>>
>> Output
>>    name      ddate       anew
>> 1    A   19-10-02 2002-10-19
>> 2    D 11/19/2006 2020-11-19
>> 3    F   9/9/2011 2011-09-09
>> 4   G1 12/29/2010 2020-12-29
>> 5   AA   10/18/93 2093-10-18
>>
>> The problem is in the last row. It should be  1993-10-18 instead of 2093-10-18
>>
>> How do I correct this?
> 
> This looks a little tricky.  The basic idea is that the %y format has to
> guess at the century, but the guess depends on things specific to your
> system.  So what would be nice is to say "two digit dates should be
> assumed to fall between 1922 and 2021", but there's no way to do that
> directly.
> 
> What you could do is recognize when you have a two digit year, and then
> force the result into the range you want.  Here's a function that does
> that, but it's not really tested much at all, so be careful if you use
> it.  (One thing:  I recommend the 'useR = TRUE' option to anydate(); it
> worked better in my tests than the default.)
> 
> adjustCentury <- function(inputString,
>                             outputDate = anydate(inputString, useR = TRUE),
>                             start = "1922-01-01") {
> 
>     start <- as.Date(start)
> 
>     twodigityear <- !grepl("[[:digit:]]{4}", inputString)
> 
>     while (length(bad <- which(twodigityear & outputDate < start))) {
>       for (i in bad) {
>         longdate <- as.POSIXlt(outputDate[i])
>         longdate$year <- longdate$year + 100
>         outputDate[i] <- as.Date(longdate)
>       }
>     }
>     longdate <- as.POSIXlt(start)
>     longdate$year <- longdate$year + 100
>     finish <- as.Date(longdate)
> 
>     while (length(bad <- which(twodigityear & outputDate >= finish))) {
>       for (i in bad) {
>         longdate <- as.POSIXlt(outputDate[i])
>         longdate$year <- longdate$year - 100
>         outputDate[i] <- as.Date(longdate)
>       }
>     }
>     outputDate
> }
> 
> library(anytime)
> DFX<-read.table(text="name ddate
>     A  19-10-02
>     D  11/19/2006
>     F  9/9/2011
>     G1  12/29/2010
>     AA   10/18/93
>     BB   10/18/1893
>     CC   10/18/2093",header=TRUE)
> 
> addFormats(c("%d-%m-%y"))
> addFormats(c("%m-%d-%y"))
> addFormats(c("%Y/%d/%m"))
> addFormats(c("%m/%d/%y"))
> 
> DFX$anew=adjustCentury(DFX$ddate, start = "1921-01-01")
> DFX
> #>   name      ddate       anew
> #> 1    A   19-10-02 2019-10-02
> #> 2    D 11/19/2006 2006-11-19
> #> 3    F   9/9/2011 2011-09-09
> #> 4   G1 12/29/2010 2010-12-29
> #> 5   AA   10/18/93 1993-10-18
> #> 6   BB 10/18/1893 1893-10-18
> #> 7   CC 10/18/2093 2093-10-18
>



More information about the R-help mailing list