[R] Regexp subexpression

Dieter Menne dieter.menne at menne-biomed.de
Sat Mar 25 17:22:52 CET 2006

I can't get the PERL subexpression translated to R. Following, for example,
B. Ripley's


I am using sub, but it looks like an ugly substitute. Assume I want to
extract the first alpha part and the first numeric part, but only if they
are in sequence.

Do I really have to use the sub twice, first extracting the first variable,
then the second? The third example should return nothing, because it's
inverted, but it returns the whole string. I know I could check that
separately, but is there no better way?

  txt =sub("([[:alpha:]]+)([[:digit:]])+","\\1",patid)
  num =sub("([[:alpha:]]+)([[:digit:]])+","\\2",patid)

It would be nice if the following data frame would be returned:

txt     num
ALAN    334
AzD     44
NA      NA (or "", "", but not so nice)


More information about the R-help mailing list