[R] Regexp subexpression

Gabor Grothendieck ggrothendieck at gmail.com
Sat Mar 25 20:24:26 CET 2006


Here is one more variation. This time we provide an alternative .*
to soak up the entire expression when it would have otherwise
failed so that the substitution occurs regardless giving us
empty strings instead of the same string back:

> pat = "^([[:alpha:]]+)([[:digit:]]+)|.*"
> sapply(sprintf("\\%d", 1:2), sub, pattern = pat, x = patid)
     \\1    \\2
[1,] "ALAN" "334"
[2,] "AzD"  "44"
[3,] ""     ""

If NAs are needed, use the same result[regexpr(pat, patid) < 0,] <- NA
as last time.

On 3/25/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> We could use sapply to reduce it slightly:
>
> result <- sapply(sprintf("\\%d", 1:2), sub, pattern = pat, x = patid)
> result[regexpr(pat, patid) < 0,] <- NA
>
>
> On 3/25/06, Dieter Menne <dieter.menne at menne-biomed.de> wrote:
> > Gabor Grothendieck <ggrothendieck <at> gmail.com> writes:
> >
> > >
> > > In the third case there is no match so there are no
> > > substitutions.  Handle it separately:
> > >
> > > pat = "^([[:alpha:]]+)([[:digit:]]+)"
> > > result <- cbind(txt = sub(pat, "\\1", patid), num = sub(pat, "\\2", patid))
> > > result[regexpr(pat, paid) < 0,] <- NA
> > >
> >
> > Thanks, Gabor, that something like a compressed version of mine.  My main
> > question was if I was missing something obvious, because I found the double sub
> > messy. I am a surprised that there is not
> >
> > pat = "^([[:alpha:]]+)([[:digit:]]+)"
> > mygrep(pat, patid)
> >
> > returning a list with all subexpressions.
> >
> > Dieter
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>




More information about the R-help mailing list