[R] String manipulation

rex.dwyer at syngenta.com rex.dwyer at syngenta.com
Wed Feb 16 15:58:01 CET 2011


A quick way to do this is to replace \d and \D with character classes [0-9.]
and [^0-9.] .  This assumes that there is no scientific notation and that there is nothing like 123.45.678 in the string.  You did not account for a leading minus sign.
The book Mastering Regular Expressions is probably worth the expense if you are going to be doing a lot of this, even though similar content can be gleaned from on line.

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Megh Dal
Sent: Sunday, February 13, 2011 4:42 PM
To: Gabor Grothendieck
Cc: r-help at r-project.org
Subject: Re: [R] String manipulation

Hi Gabor, thanks (and Jim as well) for your suggestion. However this is not
working properly for following string:

> MyString <- "ABCFR34564IJVEOJC3434.36453"
> strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d <file://d+)(//d+)(//D+)(//d>+)",
c)[[1]]
[1] "ABCFR"   "34564"   "IJVEOJC" "3434"

Therefore there is decimal number in the 4th group, which is numeric then
that is not taken care off...........

Similarly same kind of unintended result here as well:

> MyString <- "ABCFR34564.354IJVEOJC3434.36453"
> strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d <file://d+)(//d+)(//D+)(//d>+)",
c)[[1]]
[1] "ABCFR"   "34564"   "."       "354"     "IJVEOJC" "3434"    "."
"36453"
Can you please tell me how can I modify that?

Thanks,


On Sun, Feb 13, 2011 at 11:10 PM, Gabor Grothendieck <
ggrothendieck at gmail.com> wrote:

>  On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal <megh700004 at gmail.com> wrote:
> > Please consider following string:
> >
> > MyString <- "ABCFR34564IJVEOJC3434"
> >
> > Here you see that, there are 4 groups in above string. 1st and 3rd groups
> > are for english letters and 2nd and 4th for numeric. Given a string, how
> can
> > I separate out those 4 groups?
> >
>
> Try this.  "\\D+" and "\\d+" match non-digits and digits respectively.
>  The portions within parentheses are captures and passed to the c
> function.  It returns a list with a component for each element of
> MyString.  Like R's split it returns a list with a component per
> element of MyString but MyString only has one element so we get its
> contents using  [[1]].
>
> > library(gsubfn)
> > strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", c)[[1]]
> [1] "ABCFR"   "34564"   "IJVEOJC" "3434"
>
> Alternately we could convert the relevant portions to numbers at the
> same time.  ~ list(...) is interpreted as a  function whose body is
> the right hand side of the ~ and whose arguments are the free
> variables, i.e. s1, s2, s3 and s4.
>
> strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", ~ list(s1,
> as.numeric(s2), s3, as.numeric(s4)))[[1]]
>
> See http://gsubfn.googlecode.com for more.
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited. 


More information about the R-help mailing list