[R] Deleting a variable number of characters from a string

Gabor Grothendieck ggrothendieck at gmail.com
Fri Jul 16 22:23:33 CEST 2010


On Fri, Jul 16, 2010 at 1:59 PM, Davis, Brian <Brian.Davis at uth.tmc.edu> wrote:
> I have a text processing problem I'm hoping someone can help me solve.  This issue it this.
>
>  I have a character string in which I need to delete a variable number of characters from the string.  The string itself contains the number of characters to be deleted.  The number of characters to be deleted is proceeded by either a "+" or a "-".
>
> A toy example:
>
> Suppose I have
>
> x<-c("A-1CB-2GHX", "*+11gAgggTgtgggH")
>> x
> [1] "A-1CB-2GHX"       "*+11gAgggTgtgggH"
>
> What I need as output is
> "ABX" "*H"
>
> I know I can use gsub to remove the control character and the number portion with
>
> gsub("(\\-|\\+)([0-9]+)", replacement="", x)
>
> However, I can't figure out how to delete the variable number of characters after the number portion of the string.
>

Using gsubfn in the gsubfn package we match

- the - or + via [-+],
- the digits via \\d+ and
- the remaining characters via [^-+]*

parenthesizing the digits and remaining characters so that they form
back references which are passed to the function as args 1 and 2
respectively.  gsubfn supports a formula notation for functions and the
specified function using that formula notation has arguments d and s
and function body which strips the characters and returns the rest
to be substituted back in:

   > library(gsubfn)
   > gsubfn("[-+](\\d+)([^-+]*)", d + s ~ substring(s, as.numeric(d) + 1), x)
   [1] "ABX" "*H"

See http://gsubfn.googlecode.com for more.



More information about the R-help mailing list