[R] Mixed format

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Tue Jan 21 12:28:29 CET 2020


Hello,

Inline.

Às 09:22 de 21/01/20, Chris Evans escreveu:
> I think that might risk giving the wrong date for a date like 1/3/1990 which I think in Val's data is mdy data not dmy.
> 
> As I read the data, where the separator is "/" the format is mdy and where the separator is "-" it's dmy.  

Maybe you're right. But I really don't know, in my country (Portugal) we 
use "/" and dmy. Anyway, what's important is that the OP must have a 
much better understanding of the data, the way it is posted is likely to 
cause errors. See, for instance, the expected output with numbers 
greater than 12 in the 1st and 2nd places, depending on the row.


So I would
> go for:
> 
> library(lubridate)
> DFX$dnew[grep("-", DFX$ddate, fixed = TRUE)] <- dmy(DFX$ddate[grep("-", DFX$ddate, fixed = TRUE)])
> DFX$dnew[grep("/", DFX$ddate, fixed = TRUE)] <- mdy(DFX$ddate[grep("/", DFX$ddate, fixed = TRUE)])
> DFX <- DFX[!is.na(DFX$dnew),]
> DFX
> 
>    name      ddate       dnew
> 1    A   19-10-02 2002-10-19
> 2    B   22-11-20 2020-11-22
> 3    C   19-01-15 2015-01-19
> 4    D 11/19/2006 2006-11-19
> 5    F   9/9/2011 2011-09-09
> 6    G 12/29/2010 2010-12-29
> 
> But I am so much in awe of Rui's skills with R, and that of most of the regular commentators here, that I submit
> this a little nervously!

Thanks!

Rui Barradas
> 
> Many thanks to all who teach me so much here, lovely, if I am correct, to contribute for a change!
> 
> Chris
> 
> 
> ----- Original Message -----
>> From: "Rui Barradas" <ruipbarradas using sapo.pt>
>> To: "Val" <valkremk using gmail.com>, "r-help using R-project.org (r-help using r-project.org)" <r-help using r-project.org>
>> Sent: Tuesday, 21 January, 2020 00:40:29
>> Subject: Re: [R] Mixed format
> 
>> Hello,
>>
>> The following strategy works with your data.
>> It uses the fact that most dates are in one of 3 formats, dmy, mdy, ymd.
>> It tries those formats one by one, after each try looks for NA's in the
>> new column.
>>
>>
>> # first round, format is dmy
>> DFX$dnew <- lubridate::dmy(DFX$ddate)
>> na <- is.na(DFX$dnew)
>>
>> # second round, format is mdy
>> DFX$dnew[na] <- lubridate::mdy(DFX$ddate[na])
>> na <- is.na(DFX$dnew)
>>
>> # last round, format is ymd
>> DFX$dnew[na] <- lubridate::ymd(DFX$ddate[na])
>>
>> # remove what didn't fit any format
>> DFX <- DFX[!is.na(DFX$dnew), ]
>> DFX
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Às 22:58 de 20/01/20, Val escreveu:
>>> Hi All,
>>>
>>> I have a data frame where one column is  a mixed date format,
>>> a date in the form "%m-%d-%y"  and "%m/%d/%Y", also some are not in date format.
>>>
>>> Is there a way to delete the rows that contain non-dates  and
>>> standardize the dates in one date format like  %m-%d-%Y?
>>> Please see my  sample data and desired output
>>>
>>> DFX<-read.table(text="name ddate
>>>     A  19-10-02
>>>     B  22-11-20u
>>>     C  19-01-15
>>>     D  11/19/2006
>>>     F  9/9/2011
>>>     G  12/29/2010
>>>     H  DEX",header=TRUE)
>>>
>>> Desired output
>>> name ddate
>>> A  19-10-2002
>>> B  22-11-2020
>>> C  19-01-2015
>>> D  11-19-2006
>>> F  09-09-2011
>>> G  12-29-2010
>>>
>>> Thank you
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list