[R] Editing Strings in R

Marc Schwartz MSchwartz at MedAnalytics.com
Fri Jul 30 00:47:45 CEST 2004


On Thu, 2004-07-29 at 15:56, Bulutoglu Dursun A Civ AFIT/ENC wrote:
> 	I was wondering if there is a way of editting strings in R. I
> have a set of strings and each set is a row of numbers and paranthesis.
> 	For example the first row is: 
> 	(0 2)(3 4)(7 9)(5 9)(1 5)
> 	and I have a thousand or so such rows. I was wondering how I
> could get the corresponding string obtained by adding 1 to all the
> numbers in the string above.
> 	Dursun



I don't know if this is the most efficient approach, but working on a
few hours of sleep, here goes:


NewRow <- function(x)
{
  TempRow <- as.numeric(unlist(strsplit(x, "([\\(\\) ])"))) + 1

  TempMat <- matrix(TempRow[!is.na(TempRow)], ncol = 2, byrow = TRUE)

  paste("(", TempMat[, 1], " ", TempMat[, 2], ")", sep = "", 
        collapse = "")
}


Basically, the first line splits the character vector into its
components using "(", ")" and " " as regex based delimiters. It coerces
the result to a numeric vector and adds 1.

The second line takes the adjusted non-NA values and converts them into
a two column matrix, to make it easier to do the paste in line 3.

Line 3 returns the adjusted character vector reconstructed.


So:

MyRow <- "(0 2)(3 4)(7 9)(5 9)(1 5)"

> NewRow(MyRow)
[1] "(1 3)(4 5)(8 10)(6 10)(2 6)"


So, if you have a bunch of these rows, you could use this function with
apply:

MyData <- matrix(c("(0 2)(3 4)(7 9)(5 9)(1 5)", 
            "(1 6)(4 5)(3 7)(4 8)(9 0)",
            "(3 5)(8 1)(4 7)(2 7)(6 1)"))

> MyData
     [,1]                       
[1,] "(0 2)(3 4)(7 9)(5 9)(1 5)"
[2,] "(1 6)(4 5)(3 7)(4 8)(9 0)"
[3,] "(3 5)(8 1)(4 7)(2 7)(6 1)"

> matrix(apply(MyData, 1, NewRow))
     [,1]                         
[1,] "(1 3)(4 5)(8 10)(6 10)(2 6)"
[2,] "(2 7)(5 6)(4 8)(5 9)(10 1)" 
[3,] "(4 6)(9 2)(5 8)(3 8)(7 2)"  

Somebody may come up with an approach that is more efficient I suspect. 

For 1,200 rows:

> system.time(apply((matrix(rep(MyData, 400))), 1, NewRow))
[1] 0.29 0.00 0.33 0.00 0.00


(Gabor?  ;-)

HTH,

Marc Schwartz




More information about the R-help mailing list