[R] gsub issue with consecutive pattern finds

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Fri Mar 1 17:00:01 CET 2024


Oh, wait a second. I misread your original post. Please ignore my
truly incorrect suggestion.

-- Bert

On Fri, Mar 1, 2024 at 7:57 AM Bert Gunter <bgunter.4567 using gmail.com> wrote:
>
> Here's another *incorrect* way to do it -- incorrect because it will
> not always work, unlike Iris's correct solution. But it does not
> require PERL type matching. The idea: separate the two vowels in the
> regex by a character that you know cannot appear (if there is such)
> and match it optionally, e.g. with '*" repetition specifier. I used
> "?" for the optional character below (which must be escaped).
>
> >    gsub("([aeiouAEIOU])\\?*([aeiouAEIOU])", "\\1_\\2", "aerioue")
> [1] "a_eri_ou_e"
>
> Cheers,
> Bert
>
>
> On Fri, Mar 1, 2024 at 3:59 AM Iago Giné Vázquez <iago.gine using sjd.es> wrote:
> >
> > Hi Iris,
> >
> > Thank you. Further, very nice solution.
> >
> > Best,
> >
> > Iago
> >
> > On 01/03/2024 12:49, Iris Simmons wrote:
> > > Hi Iago,
> > >
> > >
> > > This is not a bug. It is expected. Patterns may not overlap. However, there
> > > is a way to get the result you want using perl:
> > >
> > > ```R
> > > gsub("([aeiouAEIOU])(?=[aeiouAEIOU])", "\\1_", "aerioue", perl = TRUE)
> > > ```
> > >
> > > The specific change I made is called a positive lookahead, you can read
> > > more about it here:
> > >
> > > https://www.regular-expressions.info/lookaround.html
> > >
> > > It's a way to check for a piece of text without consuming it in the match.
> > >
> > > Also, since you don't care about character case, it might be more legible
> > > to add ignore.case = TRUE and remove the upper case characters:
> > >
> > > ```R
> > > gsub("([aeiou])(?=[aeiou])", "\\1_", "aerioue", perl = TRUE, ignore.case =
> > > TRUE)
> > >
> > > ## or
> > >
> > > gsub("(?i)([aeiou])(?=[aeiou])", "\\1_", "aerioue", perl = TRUE)
> > > ```
> > >
> > > I hope this helps!
> > >
> > >
> > > On Fri, Mar 1, 2024, 06:37 Iago Giné Vázquez<iago.gine using sjd.es>  wrote:
> > >
> > >> Hi all,
> > >>
> > >> I tested next command:
> > >>
> > >> gsub("([aeiouAEIOU])([aeiouAEIOU])", "\\1_\\2", "aerioue")
> > >>
> > >> with the following output:
> > >>
> > >> [1] "a_eri_ou_e"
> > >>
> > >> So, there are two consecutive vowels where an underscore is not added.
> > >>
> > >> May it be a bug? Is it expected (bug or not)? Is there any chance to get
> > >> what I want (an underscore between each pair of consecutive vowels)?
> > >>
> > >>
> > >> Thank you!
> > >>
> > >> Best regards,
> > >>
> > >> Iago
> > >>
> > >>          [[alternative HTML version deleted]]
> > >>
> > >> ______________________________________________
> > >> R-help using r-project.org  mailing list -- To UNSUBSCRIBE and more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list