[R] Editing Strings in R

Gabor Grothendieck ggrothendieck at myway.com
Fri Jul 30 05:25:24 CEST 2004


Marc Schwartz <MSchwartz <at> MedAnalytics.com> writes:

> 
> On Thu, 2004-07-29 at 21:08, Gabor Grothendieck wrote:
> > Bulutoglu Dursun A Civ AFIT/ENC <Dursun.Bulutoglu <at> afit.edu> writes:
> > 
> > > 
> > > 	I was wondering if there is a way of editting strings in R. I
> > > have a set of strings and each set is a row of numbers and paranthesis.
> > > 	For example the first row is: 
> > > 	(0 2)(3 4)(7 9)(5 9)(1 5)
> > > 	and I have a thousand or so such rows. I was wondering how I
> > > could get the corresponding string obtained by adding 1 to all the
> > > numbers in the string above.
> > 
> > First do the 1 character translations simultaneously using chartr and
> > then use gsub for the remaining one to two character translation:
> > 
> > gsub("0","10",chartr("0123456789","1234567890","(0 2)(3 4)(7 9)(5 9)(1 
5)"))
> 
> Gabor, 
> 
> One problem:  Multi-digit numbers in the source string:
> 
> > gsub("0","10",chartr("0123456789","1234567890",
>        "(10 99)(3 4)(7 9)(5 9)(1 5)"))
> [1] "(21 1010)(4 5)(8 10)(6 10)(2 6)"
> 
> Note the first number "10" gets transformed to "21" and the "99" goes to
> "1010".
> 
> I made a quick update to NewRow, which is not faster, but gets it to two
> lines, instead of three, and is a bit cleaner:
> 
> NewRow <- function(x)
> {
>   TempMat <- matrix(as.numeric(unlist(strsplit(x, "([\\(\\) ])"))), 
>                     ncol = 3, byrow = TRUE) + 1
> 
>   paste("(", TempMat[, 2], " ", TempMat[, 3], ")", sep = "", 
>         collapse = "")
> }
> 
> Note that with multi digit numbers, it gives a correct result:
> 
> > NewRow("(10 99)(101 4)(7 9)(5 9)(1 5)")
> [1] "(11 100)(102 5)(8 10)(6 10)(2 6)"

The above assumes a particular pattern of parentheses, based on
the poster's example, just as mine assumed one digit numbers based
on the poster's example.  Both our examples assume the numbers
are non-negative integers.

The poster can advise us on which additional assumptions, if any,
are allowable but, just in case, here is a one line solution that 
handles multi-digit numbers and does not assume a particular pattern 
of parentheses and spaces.

For a number, say 99, the gsub replaces it with ",99+1," and
the inner paste adds c(" to the front and ") to the end making it
a valid R expression which we then evaluate and finally paste back
together using the outer paste:

R> line <- "(10 99)(101 4)(7 9)()((5 9)(1 5))"  # test data

R> paste(eval(parse(text = paste('c("', gsub("([0-9]+)", '",\\1+1,"', line, 
ext = TRUE), '")', sep = ""))), collapse = "")

[1] "(11 100)(102 5)(8 10)()((6 10)(2 6))"




More information about the R-help mailing list