[R] gsub issue with consecutive pattern finds

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Fri Mar 1 16:57:33 CET 2024


Here's another *incorrect* way to do it -- incorrect because it will
not always work, unlike Iris's correct solution. But it does not
require PERL type matching. The idea: separate the two vowels in the
regex by a character that you know cannot appear (if there is such)
and match it optionally, e.g. with '*" repetition specifier. I used
"?" for the optional character below (which must be escaped).

>    gsub("([aeiouAEIOU])\\?*([aeiouAEIOU])", "\\1_\\2", "aerioue")
[1] "a_eri_ou_e"

Cheers,
Bert


On Fri, Mar 1, 2024 at 3:59 AM Iago Giné Vázquez <iago.gine using sjd.es> wrote:
>
> Hi Iris,
>
> Thank you. Further, very nice solution.
>
> Best,
>
> Iago
>
> On 01/03/2024 12:49, Iris Simmons wrote:
> > Hi Iago,
> >
> >
> > This is not a bug. It is expected. Patterns may not overlap. However, there
> > is a way to get the result you want using perl:
> >
> > ```R
> > gsub("([aeiouAEIOU])(?=[aeiouAEIOU])", "\\1_", "aerioue", perl = TRUE)
> > ```
> >
> > The specific change I made is called a positive lookahead, you can read
> > more about it here:
> >
> > https://www.regular-expressions.info/lookaround.html
> >
> > It's a way to check for a piece of text without consuming it in the match.
> >
> > Also, since you don't care about character case, it might be more legible
> > to add ignore.case = TRUE and remove the upper case characters:
> >
> > ```R
> > gsub("([aeiou])(?=[aeiou])", "\\1_", "aerioue", perl = TRUE, ignore.case =
> > TRUE)
> >
> > ## or
> >
> > gsub("(?i)([aeiou])(?=[aeiou])", "\\1_", "aerioue", perl = TRUE)
> > ```
> >
> > I hope this helps!
> >
> >
> > On Fri, Mar 1, 2024, 06:37 Iago Giné Vázquez<iago.gine using sjd.es>  wrote:
> >
> >> Hi all,
> >>
> >> I tested next command:
> >>
> >> gsub("([aeiouAEIOU])([aeiouAEIOU])", "\\1_\\2", "aerioue")
> >>
> >> with the following output:
> >>
> >> [1] "a_eri_ou_e"
> >>
> >> So, there are two consecutive vowels where an underscore is not added.
> >>
> >> May it be a bug? Is it expected (bug or not)? Is there any chance to get
> >> what I want (an underscore between each pair of consecutive vowels)?
> >>
> >>
> >> Thank you!
> >>
> >> Best regards,
> >>
> >> Iago
> >>
> >>          [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help using r-project.org  mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list