[R] R Package for Text Manipulation

Gabor Grothendieck ggrothendieck at gmail.com
Sat Aug 9 15:01:34 CEST 2014


On Sat, Aug 9, 2014 at 8:15 AM, Omar André Gonzáles Díaz
<oma.gonzales at gmail.com> wrote:
> Hi all,
>
> I want to know, where i can find a package to simulate the functions
> "Search and Replace  and "Find Words that contain - replace them with...",
> that we can use in EXCEL.
>
> I've look in other places and they say: "Reshape2" by Hadley Wickham. How
> ever, i've investigated it and its not exactly what i'm looking (it's main
> functions are "cast" and "melt", sure you know them).
>
> May you help me please? I want to download data from Google Analytics and
> clean it, what is the best approach?
>
>         [[alternative HTML version deleted]]
>

1. The gsubfn function in the gsubfn package can do that.  These
commands extract the words and then apply the function represented in
formula notation in the second argument to them:

library(gsubfn) # home page at http://gsubfn.googlecode.com
s <- "The quick brown fox" # test data

# replace the word quick with QUICK

gsubfn("\\S+", ~ if (x == "quick") "QUICK" else x, s)
## [1] "The QUICK brown fox"

# replace words containing o with ?

gsubfn("\\S+", ~ if (grepl("o", x)) "?" else x, s)
## [1] "The quick ? ?"

2. It can also be done without packages:

# replace quick with QUICK

gsub("\\bquick\\b", "QUICK", s)
## [1] "The QUICK brown fox"

# or the following which first split s into a vector of words and
# operate on that pasting it back into a single string at the end

words <- strsplit(s, "\\s+")[[1]]
paste(replace(words, words == "quick", "QUICK"), collapse = " ")
## [1] "The QUICK brown fox"

# replace words containing o with ?.  Use `words` from above.

paste(replace(words, grepl("o", words), "?"), collapse = " ")
## [1] "The quick ? ?"

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list