[R] String manipulation
Gabor Grothendieck
ggrothendieck at gmail.com
Sun Feb 13 18:40:13 CET 2011
On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal <megh700004 at gmail.com> wrote:
> Please consider following string:
>
> MyString <- "ABCFR34564IJVEOJC3434"
>
> Here you see that, there are 4 groups in above string. 1st and 3rd groups
> are for english letters and 2nd and 4th for numeric. Given a string, how can
> I separate out those 4 groups?
>
Try this. "\\D+" and "\\d+" match non-digits and digits respectively.
The portions within parentheses are captures and passed to the c
function. It returns a list with a component for each element of
MyString. Like R's split it returns a list with a component per
element of MyString but MyString only has one element so we get its
contents using [[1]].
> library(gsubfn)
> strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", c)[[1]]
[1] "ABCFR" "34564" "IJVEOJC" "3434"
Alternately we could convert the relevant portions to numbers at the
same time. ~ list(...) is interpreted as a function whose body is
the right hand side of the ~ and whose arguments are the free
variables, i.e. s1, s2, s3 and s4.
strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", ~ list(s1,
as.numeric(s2), s3, as.numeric(s4)))[[1]]
See http://gsubfn.googlecode.com for more.
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list