[R] extract date

Gabor Grothendieck ggrothendieck at gmail.com
Tue Apr 5 13:40:55 CEST 2005


I just started using gmail and one thing that I thought would
be annoying but sometimes is actually interesting are the
ads at the right hand side.  They are keyed off the content
of the email and in the case of your post produced:

http://www.visibone.com/regular-expressions/?via=google120

http://www.regexpbuddy.com

The first one is advertising a javascript reference card (which
I happen to own and is excellent); but in any case, the contents 
of the regexp part of the reference card are fully reproduced on 
the web page and includes dozens of examples of regexps that 
you could try.  I haven't explored the other web site.

Although I have not read it, there is a book called Mastering
Regular Expressions.

By the way, here is an alternative to calculating nd in Prof.
Riley's post just to give you something else to play with. 
I think I prefer his solution but this one is arguably a bit
simpler.  The three portions separated by the two  bars
are each deleted if they are present.  gsub causes it
to repeatedly try them so that it does not stop after
deleting the first one:

nd <- gsub("Date: |.*, | ..:.*$", "", dates)

On Apr 5, 2005 7:22 AM, Petr Pikal <petr.pikal at precheza.cz> wrote:
> Dear Prof.Ripley
> 
> Thank you for your answer. After some tests and errors I finished
> with suitable extraction function which gives me substatnial
> increase in positive answers.
> 
> Nevertheless I definitely need to gain more practice in regular
> expressions, but from the help page I can grasp only easy things. Is
> there any "Regular expressions for dummies" available?
> 
> Best regards
> Petr Pikal
> 
> On 5 Apr 2005 at 10:23, Prof Brian Ripley wrote:
> 
> > On Tue, 5 Apr 2005, Petr Pikal wrote:
> >
> > > Dear all,
> > >
> > > please, is there any possibility how to extract a date from data
> > > which are like this:
> >
> > Yes, if you delimit all the possibilities.
> >
> > > ....
> > > "Date: Sat, 21 Feb 04 10:25:43 GMT"
> > > "Date: 13 Feb 2004 13:54:22 -0600"
> > > "Date: Fri, 20 Feb 2004 17:00:48 +0000"
> > > "Date: Fri, 14 Jun 2002 16:22:27 -0400"
> > > "Date: Wed, 18 Feb 2004 08:53:56 -0500"
> > > "Date: 20 Feb 2004 02:18:58 -0600"
> > > "Date: Sun, 15 Feb 2004 16:01:19 +0800"
> > > ....
> > >
> > > I used
> > >
> > > strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22),
> > > sep="-"), format="%d-%b-%Y")
> > >
> > > which suits to lines 3:5 and 7 (such are the most common in my
> > > dataset) but obviously does not work with other lines.
> >
> > For those examples, in character vector 'dates' (without quotes):
> >
> > > nd <- gsub("^[^0-9]*([0-9]+) ([A-Za-z]+) ([0-9]+).*",
> >               "\\1 \\2 \\3", dates)
> > > strptime(nd, "%d %b %y")
> > [1] "2004-02-21" "2020-02-13" "2020-02-20" "2020-06-14" "2020-02-18"
> > [6] "2020-02-20" "2020-02-15"
> >
> > You should be able to amend the regexp for a wider range of forms, but
> > your first line is ambiguous (2004 or 2021?) so there are limits.
> >
> > > If there is no stightforward solution I can live with what I use now
> > > but some automagical function like
> > >
> > > give.me.date.from.my.string.regardles.of.formating(x)
> > > would be great.
> >
> > It would be impossible: when Americans write 07/04/2004 they do not
> > mean April 7th.
> >
> > --
> > Brian D. Ripley,                  ripley at stats.ox.ac.uk
> > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> > University of Oxford,             Tel:  +44 1865 272861 (self) 1 South
> > Parks Road,                     +44 1865 272866 (PA) Oxford OX1 3TG,
> > UK                Fax:  +44 1865 272595
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> 
> Petr Pikal
> petr.pikal at precheza.cz
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list