[R] Undesired result
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Wed Feb 17 19:06:01 CET 2021
Just a quick note: you can simplify my function and speed it up quite a
bit if speed is an issue. I had forgotten that the POSIXlt type could
act like a vector; using that you don't need those inner for loops, and
with a little calculation you can also do without the outer while loops.
Duncan Murdoch
On 17/02/2021 12:50 p.m., Duncan Murdoch wrote:
> On 17/02/2021 9:50 a.m., Val wrote:
>> HI All,
>>
>> I am reading a data file which has different date formats. I wanted to
>> standardize to one format and used a library anytime but got
>> undesired results as shown below. It gave me year 2093 instead of 1993
>>
>>
>> library(anytime)
>> DFX<-read.table(text="name ddate
>> A 19-10-02
>> D 11/19/2006
>> F 9/9/2011
>> G1 12/29/2010
>> AA 10/18/93 ",header=TRUE)
>> getFormats()
>> addFormats(c("%d-%m-%y"))
>> addFormats(c("%m-%d-%y"))
>> addFormats(c("%Y/%d/%m"))
>> addFormats(c("%m/%d/%y"))
>>
>> DFX$anew=anydate(DFX$ddate)
>>
>> Output
>> name ddate anew
>> 1 A 19-10-02 2002-10-19
>> 2 D 11/19/2006 2020-11-19
>> 3 F 9/9/2011 2011-09-09
>> 4 G1 12/29/2010 2020-12-29
>> 5 AA 10/18/93 2093-10-18
>>
>> The problem is in the last row. It should be 1993-10-18 instead of 2093-10-18
>>
>> How do I correct this?
>
> This looks a little tricky. The basic idea is that the %y format has to
> guess at the century, but the guess depends on things specific to your
> system. So what would be nice is to say "two digit dates should be
> assumed to fall between 1922 and 2021", but there's no way to do that
> directly.
>
> What you could do is recognize when you have a two digit year, and then
> force the result into the range you want. Here's a function that does
> that, but it's not really tested much at all, so be careful if you use
> it. (One thing: I recommend the 'useR = TRUE' option to anydate(); it
> worked better in my tests than the default.)
>
> adjustCentury <- function(inputString,
> outputDate = anydate(inputString, useR = TRUE),
> start = "1922-01-01") {
>
> start <- as.Date(start)
>
> twodigityear <- !grepl("[[:digit:]]{4}", inputString)
>
> while (length(bad <- which(twodigityear & outputDate < start))) {
> for (i in bad) {
> longdate <- as.POSIXlt(outputDate[i])
> longdate$year <- longdate$year + 100
> outputDate[i] <- as.Date(longdate)
> }
> }
> longdate <- as.POSIXlt(start)
> longdate$year <- longdate$year + 100
> finish <- as.Date(longdate)
>
> while (length(bad <- which(twodigityear & outputDate >= finish))) {
> for (i in bad) {
> longdate <- as.POSIXlt(outputDate[i])
> longdate$year <- longdate$year - 100
> outputDate[i] <- as.Date(longdate)
> }
> }
> outputDate
> }
>
> library(anytime)
> DFX<-read.table(text="name ddate
> A 19-10-02
> D 11/19/2006
> F 9/9/2011
> G1 12/29/2010
> AA 10/18/93
> BB 10/18/1893
> CC 10/18/2093",header=TRUE)
>
> addFormats(c("%d-%m-%y"))
> addFormats(c("%m-%d-%y"))
> addFormats(c("%Y/%d/%m"))
> addFormats(c("%m/%d/%y"))
>
> DFX$anew=adjustCentury(DFX$ddate, start = "1921-01-01")
> DFX
> #> name ddate anew
> #> 1 A 19-10-02 2019-10-02
> #> 2 D 11/19/2006 2006-11-19
> #> 3 F 9/9/2011 2011-09-09
> #> 4 G1 12/29/2010 2010-12-29
> #> 5 AA 10/18/93 1993-10-18
> #> 6 BB 10/18/1893 1893-10-18
> #> 7 CC 10/18/2093 2093-10-18
>
More information about the R-help
mailing list