[R] Replace Text but not from within a word
Marc Schwartz
marc_schwartz at me.com
Tue Feb 28 15:50:18 CET 2017
> On Feb 28, 2017, at 8:36 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
>
> For tasks like this, you will probably want to make sure to import the data as character data rather than as a factor. E.g.
>
> dat <- read.csv( "myfile.csv", header=FALSE, as.is=TRUE )
>
> You can check what you have with the str() function.
Jeff,
Narrowly, for this particular task, that is not relevant.
gsub() and family use as.character() internally to coerce a factor to character and will work just fine:
text <- factor(c("BOEING CO","ENGMANTAYLOR CO","SAGINAW COUNTY INC"))
> text
[1] BOEING CO ENGMANTAYLOR CO SAGINAW COUNTY INC
Levels: BOEING CO ENGMANTAYLOR CO SAGINAW COUNTY INC
> gsub(" CO$", "", text)
[1] "BOEING" "ENGMANTAYLOR" "SAGINAW COUNTY INC"
Using 'as.is' becomes more a personal preference issue beyond this.
Regards,
Marc
> --
> Sent from my phone. Please excuse my brevity.
>
> On February 28, 2017 5:19:40 AM PST, Marc Schwartz <marc_schwartz at me.com> wrote:
>>
>>> On Feb 28, 2017, at 3:38 AM, Harshal Athawale
>> <pgcim15.harshal at spjimr.org> wrote:
>>>
>>> I am new in R.
>>>
>>> I have a file. This file contains name of the companies.
>>> 'data.frame': 494 obs. of 1 variable:
>>> $ V1: Factor w/ 470 levels "3-d engineering corp",..: 293 134 339 359
>> 143
>>> 399 122 447 398 384 ...
>>>
>>> Problem: I would like to remove "CO" (As it is the most frequent
>> word). I
>>> would like "CO" to removed from BOEING CO --> BOEING but not from
>> SAGINAW
>>> *CO*UNTY INC*. *
>>>
>>>> text = c("BOEING CO","ENGMANTAYLOR CO","SAGINAW COUNTY INC")
>>>
>>>> gsub(x = text, pattern = "CO", replacement = "")
>>>
>>> [1] "BOEING " "ENGMANTAYLOR " "SAGINAW UNTY"
>>>
>>> Thanks in advance.
>>>
>>> - Sam
>>
>>
>> Hi,
>>
>> See ?regex and ?grep for some details and examples on how to construct
>> the expression used for matching, as well as some of the references
>> therein.
>>
>> In this case, you want to use something along the lines of:
>>
>>> gsub(" CO$", "", text)
>> [1] "BOEING" "ENGMANTAYLOR" "SAGINAW COUNTY INC"
>>
>> where the "CO" is preceded by a space and followed by the "$", which is
>> a special character that indicates the end of the string to be matched.
>>
>> Regards,
>>
>> Marc Schwartz
>>
[[alternative HTML version deleted]]
More information about the R-help
mailing list