[R] Extracting text from a character string
Marc Schwartz
marc_schwartz at comcast.net
Fri Mar 9 21:44:47 CET 2007
On Fri, 2007-03-09 at 15:23 -0500, Shawn Way wrote:
> I have a set of character strings like below:
>
> > data3[1]
> [1] "CB01_0171_03-27-2002-(Sample 26609)-(126)"
> >
>
> I am trying to extract the text 03-27-2002 and convert this into a date
> for the same record. I keep looking at the grep function, however I
> cannot quite get it to work.
>
> grep("\d\d-\d\d-\d\d\d\d",data3[1],perl=TRUE,value=TRUE)
>
> Any hints?
At least two different ways:
Vec <- "CB01_0171_03-27-2002-(Sample 26609)-(126)"
1. Using substr(), if your source vector is a fixed format
# Get the 11th thru the 20th character
> substr(Vec, 11, 20)
[1] "03-27-2002"
2. Using sub() for a more generalized approach:
# Use a back reference, returning the value pattern within the
# parens
> sub(".+([0-9]{2}-[0-9]{2}-[0-9]{4}).+", "\\1", Vec)
[1] "03-27-2002"
See ?substr, ?sub and ?regex
HTH,
Marc Schwartz
More information about the R-help
mailing list