[R] Mixed format
Chris Evans
chr|@ho|d @end|ng |rom p@yctc@org
Tue Jan 21 10:22:24 CET 2020
I think that might risk giving the wrong date for a date like 1/3/1990 which I think in Val's data is mdy data not dmy.
As I read the data, where the separator is "/" the format is mdy and where the separator is "-" it's dmy. So I would
go for:
library(lubridate)
DFX$dnew[grep("-", DFX$ddate, fixed = TRUE)] <- dmy(DFX$ddate[grep("-", DFX$ddate, fixed = TRUE)])
DFX$dnew[grep("/", DFX$ddate, fixed = TRUE)] <- mdy(DFX$ddate[grep("/", DFX$ddate, fixed = TRUE)])
DFX <- DFX[!is.na(DFX$dnew),]
DFX
name ddate dnew
1 A 19-10-02 2002-10-19
2 B 22-11-20 2020-11-22
3 C 19-01-15 2015-01-19
4 D 11/19/2006 2006-11-19
5 F 9/9/2011 2011-09-09
6 G 12/29/2010 2010-12-29
But I am so much in awe of Rui's skills with R, and that of most of the regular commentators here, that I submit
this a little nervously!
Many thanks to all who teach me so much here, lovely, if I am correct, to contribute for a change!
Chris
----- Original Message -----
> From: "Rui Barradas" <ruipbarradas using sapo.pt>
> To: "Val" <valkremk using gmail.com>, "r-help using R-project.org (r-help using r-project.org)" <r-help using r-project.org>
> Sent: Tuesday, 21 January, 2020 00:40:29
> Subject: Re: [R] Mixed format
> Hello,
>
> The following strategy works with your data.
> It uses the fact that most dates are in one of 3 formats, dmy, mdy, ymd.
> It tries those formats one by one, after each try looks for NA's in the
> new column.
>
>
> # first round, format is dmy
> DFX$dnew <- lubridate::dmy(DFX$ddate)
> na <- is.na(DFX$dnew)
>
> # second round, format is mdy
> DFX$dnew[na] <- lubridate::mdy(DFX$ddate[na])
> na <- is.na(DFX$dnew)
>
> # last round, format is ymd
> DFX$dnew[na] <- lubridate::ymd(DFX$ddate[na])
>
> # remove what didn't fit any format
> DFX <- DFX[!is.na(DFX$dnew), ]
> DFX
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 22:58 de 20/01/20, Val escreveu:
>> Hi All,
>>
>> I have a data frame where one column is a mixed date format,
>> a date in the form "%m-%d-%y" and "%m/%d/%Y", also some are not in date format.
>>
>> Is there a way to delete the rows that contain non-dates and
>> standardize the dates in one date format like %m-%d-%Y?
>> Please see my sample data and desired output
>>
>> DFX<-read.table(text="name ddate
>> A 19-10-02
>> B 22-11-20u
>> C 19-01-15
>> D 11/19/2006
>> F 9/9/2011
>> G 12/29/2010
>> H DEX",header=TRUE)
>>
>> Desired output
>> name ddate
>> A 19-10-2002
>> B 22-11-2020
>> C 19-01-2015
>> D 11-19-2006
>> F 09-09-2011
>> G 12-29-2010
>>
>> Thank you
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Chris Evans <chris using psyctc.org> Visiting Professor, University of Sheffield <chris.evans using sheffield.ac.uk>
I do some consultation work for the University of Roehampton <chris.evans using roehampton.ac.uk> and other places
but <chris using psyctc.org> remains my main Email address. I have a work web site at:
https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see:
https://www.psyctc.org/pelerinage2016/semigrating-to-france/
That page will also take you to my blog which started with earlier joys in France and Spain!
If you want to book to talk, I am trying to keep that to Thursdays and my diary is at:
https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.
More information about the R-help
mailing list