[R] regexpr - ignore all special characters and punctuation in a string

Sven E. Templer sven.templer at gmail.com
Mon Apr 20 16:10:14 CEST 2015


Hi Dimitri,

str_replace_all is not in the base libraries, you could use 'gsub' as well,
for example:

a = "What a nice day today! - Story of happiness: Part 2."
b = "What a nice day today: Story of happiness (Part 2)"
sa = gsub("[^A-Za-z0-9]", "", a)
sb = gsub("[^A-Za-z0-9]", "", b)
a==b
# [1] FALSE
sa==sb
# [1] TRUE

Take care of the extra space in a after the '-', so also replace spaces...

Best,
Sven.

On 20 April 2015 at 16:05, Dimitri Liakhovitski <
dimitri.liakhovitski at gmail.com> wrote:

> I think I found a partial answer:
>
> str_replace_all(x, "[[:punct:]]", " ")
>
> On Mon, Apr 20, 2015 at 9:59 AM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
> > Hello!
> >
> > Please point me in the right direction.
> > I need to match 2 strings, but focusing ONLY on characters, ignoring
> > all special characters and punctuation signs, including (), "", etc..
> >
> > For example:
> > I want the following to return: TRUE
> >
> > "What a nice day today! - Story of happiness: Part 2." ==
> >    "What a nice day today: Story of happiness (Part 2)"
> >
> >
> > --
> > Thank you!
> > Dimitri Liakhovitski
>
>
>
> --
> Dimitri Liakhovitski
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list