[R] Regular expressions: offsets of groups
Gabor Grothendieck
ggrothendieck at gmail.com
Tue Sep 28 13:47:24 CEST 2010
On Tue, Sep 28, 2010 at 6:52 AM, Titus von der Malsburg
<malsburg at gmail.com> wrote:
> On Tue, Sep 28, 2010 at 9:46 AM, Michael Bedward
> <michael.bedward at gmail.com> wrote:
>> What Titus wants to do is akin to retrieving capturing groups from a
>> Matcher object in Java.
>
> Precisely. Here's the description:
>
> http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html#start(int)
>
> Gabor's lookbehind trick solves some special cases but it's not the
The only limitation is that in the regular expressions supported by R
you cannot have repitition in the (<=...) portion but none of your
examples -- neither the one you gave nor the one below require that
since if the prior expression ends in X+ you can just use X. Are
you sure it does not cover all your actual situations?
If you truly do have situations where that require repetition a
gregexpr plus gsubfn will do it in one line. Parenthesize the
portion of the regular expression you want to capture and replace
every character in it with X (or some other character that does not
otherwise occur). Then find the positions and lengths of strings of
X.
> gregexpr("X+", gsubfn("a(b+)", ~ gsub(".", "X", x), "abcdaabbcbbb"))
[[1]]
[1] 1 5
attr(,"match.length")
[1] 1 2
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list