[R] extract date
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Apr 5 11:23:26 CEST 2005
On Tue, 5 Apr 2005, Petr Pikal wrote:
> Dear all,
>
> please, is there any possibility how to extract a date from data
> which are like this:
Yes, if you delimit all the possibilities.
> ....
> "Date: Sat, 21 Feb 04 10:25:43 GMT"
> "Date: 13 Feb 2004 13:54:22 -0600"
> "Date: Fri, 20 Feb 2004 17:00:48 +0000"
> "Date: Fri, 14 Jun 2002 16:22:27 -0400"
> "Date: Wed, 18 Feb 2004 08:53:56 -0500"
> "Date: 20 Feb 2004 02:18:58 -0600"
> "Date: Sun, 15 Feb 2004 16:01:19 +0800"
> ....
>
> I used
>
> strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22),
> sep="-"), format="%d-%b-%Y")
>
> which suits to lines 3:5 and 7 (such are the most common in my
> dataset) but obviously does not work with other lines.
For those examples, in character vector 'dates' (without quotes):
> nd <- gsub("^[^0-9]*([0-9]+) ([A-Za-z]+) ([0-9]+).*",
"\\1 \\2 \\3", dates)
> strptime(nd, "%d %b %y")
[1] "2004-02-21" "2020-02-13" "2020-02-20" "2020-06-14" "2020-02-18"
[6] "2020-02-20" "2020-02-15"
You should be able to amend the regexp for a wider range of forms, but
your first line is ambiguous (2004 or 2021?) so there are limits.
> If there is no stightforward solution I can live with what I use now but
> some automagical function like
>
> give.me.date.from.my.string.regardles.of.formating(x)
> would be great.
It would be impossible: when Americans write 07/04/2004 they do not mean
April 7th.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list