[R] Replace Text but not from within a word

Marc Schwartz marc_schwartz at me.com
Tue Feb 28 14:19:40 CET 2017


> On Feb 28, 2017, at 3:38 AM, Harshal Athawale <pgcim15.harshal at spjimr.org> wrote:
> 
> I am new in R.
> 
> I have a file. This file contains name of the companies.
> 'data.frame': 494 obs. of  1 variable:
> $ V1: Factor w/ 470 levels "3-d engineering corp",..: 293 134 339 359 143
> 399 122 447 398 384 ...
> 
> Problem: I would like to remove "CO" (As it is the most frequent word). I
> would like "CO" to removed from BOEING CO --> BOEING but not from SAGINAW
> *CO*UNTY INC*. *
> 
>> text = c("BOEING CO","ENGMANTAYLOR CO","SAGINAW COUNTY INC")
> 
>> gsub(x = text, pattern = "CO", replacement = "")
> 
> [1] "BOEING "       "ENGMANTAYLOR " "SAGINAW UNTY"
> 
> Thanks in advance.
> 
> - Sam


Hi,

See ?regex and ?grep for some details and examples on how to construct the expression used for matching, as well as some of the references therein.

In this case, you want to use something along the lines of:

> gsub(" CO$", "", text)
[1] "BOEING"             "ENGMANTAYLOR"       "SAGINAW COUNTY INC"

where the "CO" is preceded by a space and followed by the "$", which is a special character that indicates the end of the string to be matched.

Regards,

Marc Schwartz


	[[alternative HTML version deleted]]



More information about the R-help mailing list