[R] Working with string
Bogaso Christofer
bogaso.christofer at gmail.com
Thu Jul 7 19:33:14 CEST 2011
Thanks Marc for your reply and detailed explanation. As you said, I also
agree that, using stringr package I wont get anything really important,
however I already have created a long code-book and now I do not want to
change anything. However function names are here better meaningful.
I have one more query here. Does "\\1" mean that, I want to report the
selected string (in place of replacing with something?) What are the other
related things? Can you help me giving some online reference?
Thanks,
-----Original Message-----
From: Marc Schwartz [mailto:marc_schwartz at me.com]
Sent: 07 July 2011 21:54
To: Bogaso Christofer
Cc: r-help at r-project.org
Subject: Re: [R] Working with string
On Jul 7, 2011, at 11:21 AM, Bogaso Christofer wrote:
> Hi there, I have to extract some relevant portion from a defined
> string, which is a mix of numeric and character. However this has
> following
> sequence:
>
>
>
> Some String - Some numerical - "c/C" (or "p/P") - then again some set
> of numbers.
>
>
>
> Examples of such string is "fdahsdfcha163517253c463278643" or
> "fdahsdfcha163517253C463278643" or "fdahsdfcha163517253P463278643",
> "fdahsdfcha163517253p463278643" etc.
>
>
>
> I have tried using latest stringr package to accomplice that. Here is
> my
> try:
>
>
>
>> library(stringr)
>
>> str_extract("fdahsdfcha163517253c463278643", "[c]")
>
> [1] "c"
>
>
>
> But it seems that, above code fetching "c" from "fdahsdfcha" only. My
> goal is to understand what is there between above 2 set of numbers,
"C/c/P/p"?
> Can somebody help me how to do that? I would like to use stringr
> syntax because, I am already using lot of other functions from that.
> Therefore if I can do it using that package then it would be good in terms
of consistency.
>
>
>
> Thanks for your help.
I don't use 'stringr', but you can get the desired result using ?gsub:
x <- c("fdahsdfcha163517253c463278643", "fdahsdfcha163517253C463278643",
"fdahsdfcha163517253P463278643", "fdahsdfcha163517253p463278643")
> gsub(".+[0-9]+([cCpP])[0-9]+", "\\1", x)
[1] "c" "C" "P" "p"
The regex in the first argument tells gsub to find a sequence of any
characters, followed by a sequence of numbers, followed a by single 'c',
'C', 'p' or 'P', finally followed by a sequence of numbers.
Surrounding the [cCpP] in parens allows us to use a 'back reference' and
return what is found within the parens using the "\\1" in the second
argument.
>From a brief review of the stringr manual, it looks like str_extract()
supports the use of a regex for the pattern argument, but does not support
the use of back references. It looks like str_replace_all() is a wrapper to
gsub(), so you may want to look at that function and the examples for it.
Thus, the syntax might be something like:
str_replace_all(x, ".+[0-9]+([cCpP])[0-9]+", "\\1")
and therefore, I am not sure what you are really saving by using it versus
gsub() directly.
HTH,
Marc Schwartz
More information about the R-help
mailing list