[R] regexpr - ignore all special characters and punctuation in a string

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Mon Apr 20 17:20:16 CEST 2015


Thanks a lot, everybody for excellent suggestions!

On Mon, Apr 20, 2015 at 10:15 AM, Charles Determan
<cdetermanjr at gmail.com> wrote:
> You can use the [:alnum:] regex class with gsub.
>
> str1 <- "What a nice day today! - Story of happiness: Part 2."
> str2 <- "What a nice day today: Story of happiness (Part 2)"
>
> gsub("[^[:alnum:]]", "", str1) == gsub("[^[:alnum:]]", "", str2)
> [1] TRUE
>
> The same can be done with the stringr package if you really are partial to
> it.
>
> library(stringr)
>
>
>
>
>
> On Mon, Apr 20, 2015 at 9:10 AM, Sven E. Templer <sven.templer at gmail.com>
> wrote:
>>
>> Hi Dimitri,
>>
>> str_replace_all is not in the base libraries, you could use 'gsub' as
>> well,
>> for example:
>>
>> a = "What a nice day today! - Story of happiness: Part 2."
>> b = "What a nice day today: Story of happiness (Part 2)"
>> sa = gsub("[^A-Za-z0-9]", "", a)
>> sb = gsub("[^A-Za-z0-9]", "", b)
>> a==b
>> # [1] FALSE
>> sa==sb
>> # [1] TRUE
>>
>> Take care of the extra space in a after the '-', so also replace spaces...
>>
>> Best,
>> Sven.
>>
>> On 20 April 2015 at 16:05, Dimitri Liakhovitski <
>> dimitri.liakhovitski at gmail.com> wrote:
>>
>> > I think I found a partial answer:
>> >
>> > str_replace_all(x, "[[:punct:]]", " ")
>> >
>> > On Mon, Apr 20, 2015 at 9:59 AM, Dimitri Liakhovitski
>> > <dimitri.liakhovitski at gmail.com> wrote:
>> > > Hello!
>> > >
>> > > Please point me in the right direction.
>> > > I need to match 2 strings, but focusing ONLY on characters, ignoring
>> > > all special characters and punctuation signs, including (), "", etc..
>> > >
>> > > For example:
>> > > I want the following to return: TRUE
>> > >
>> > > "What a nice day today! - Story of happiness: Part 2." ==
>> > >    "What a nice day today: Story of happiness (Part 2)"
>> > >
>> > >
>> > > --
>> > > Thank you!
>> > > Dimitri Liakhovitski
>> >
>> >
>> >
>> > --
>> > Dimitri Liakhovitski
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Dimitri Liakhovitski



More information about the R-help mailing list