[R] Editing Strings in R
Gabor Grothendieck
ggrothendieck at myway.com
Fri Jul 30 05:25:24 CEST 2004
Marc Schwartz <MSchwartz <at> MedAnalytics.com> writes:
>
> On Thu, 2004-07-29 at 21:08, Gabor Grothendieck wrote:
> > Bulutoglu Dursun A Civ AFIT/ENC <Dursun.Bulutoglu <at> afit.edu> writes:
> >
> > >
> > > I was wondering if there is a way of editting strings in R. I
> > > have a set of strings and each set is a row of numbers and paranthesis.
> > > For example the first row is:
> > > (0 2)(3 4)(7 9)(5 9)(1 5)
> > > and I have a thousand or so such rows. I was wondering how I
> > > could get the corresponding string obtained by adding 1 to all the
> > > numbers in the string above.
> >
> > First do the 1 character translations simultaneously using chartr and
> > then use gsub for the remaining one to two character translation:
> >
> > gsub("0","10",chartr("0123456789","1234567890","(0 2)(3 4)(7 9)(5 9)(1
5)"))
>
> Gabor,
>
> One problem: Multi-digit numbers in the source string:
>
> > gsub("0","10",chartr("0123456789","1234567890",
> "(10 99)(3 4)(7 9)(5 9)(1 5)"))
> [1] "(21 1010)(4 5)(8 10)(6 10)(2 6)"
>
> Note the first number "10" gets transformed to "21" and the "99" goes to
> "1010".
>
> I made a quick update to NewRow, which is not faster, but gets it to two
> lines, instead of three, and is a bit cleaner:
>
> NewRow <- function(x)
> {
> TempMat <- matrix(as.numeric(unlist(strsplit(x, "([\\(\\) ])"))),
> ncol = 3, byrow = TRUE) + 1
>
> paste("(", TempMat[, 2], " ", TempMat[, 3], ")", sep = "",
> collapse = "")
> }
>
> Note that with multi digit numbers, it gives a correct result:
>
> > NewRow("(10 99)(101 4)(7 9)(5 9)(1 5)")
> [1] "(11 100)(102 5)(8 10)(6 10)(2 6)"
The above assumes a particular pattern of parentheses, based on
the poster's example, just as mine assumed one digit numbers based
on the poster's example. Both our examples assume the numbers
are non-negative integers.
The poster can advise us on which additional assumptions, if any,
are allowable but, just in case, here is a one line solution that
handles multi-digit numbers and does not assume a particular pattern
of parentheses and spaces.
For a number, say 99, the gsub replaces it with ",99+1," and
the inner paste adds c(" to the front and ") to the end making it
a valid R expression which we then evaluate and finally paste back
together using the outer paste:
R> line <- "(10 99)(101 4)(7 9)()((5 9)(1 5))" # test data
R> paste(eval(parse(text = paste('c("', gsub("([0-9]+)", '",\\1+1,"', line,
ext = TRUE), '")', sep = ""))), collapse = "")
[1] "(11 100)(102 5)(8 10)()((6 10)(2 6))"
More information about the R-help
mailing list