[R] Discovering patterns in textual strings

Bert Gunter bgunter.4567 at gmail.com
Sat May 5 00:41:17 CEST 2018


The answer is, of course, using regular expressions and/or libraries
therefor. However, I do not think you have defined your problem
sufficiently. Some questions I have:

1. Do possible patterns to be matched always appear at the beginning
of your strings?

2. Always together between specified separators ("_"  in your
example); or one of several specified separators; or otherwise?

3. Do spaces or other nonprinting characters occur in your strings?

e.g. would

abc_something
this.is_a long stringwithabcinthemiddle

be considered matching?
There are undoubtedly other possibilities that I've missed.

You may also find it useful to check this "task view" out for possibilities:
https://cran.r-project.org/web/views/NaturalLanguageProcessing.html

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, May 4, 2018 at 3:25 PM, Jeff Reichman <reichmanj at sbcglobal.net> wrote:
> R Help Forum
>
>
>
> Is there a R library (or a way) that I can extract unique character strings,
> or repeating patterns in textual strings.  Say for example I have the
> following records:
>
>
>
> Abc_1234_kjhksh_276
>
> Abc
>
> Abc_1234_lakdofyo_324
>
> Bce_876_skdhk_*&^%*&
>
> Bce
>
> Bce_454
>
>
>
> And I would like to see the following results
>
> Abc
>
> Abc_1234
>
> Bce
>
>
>
>
>
> Jeff Reichman
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list