[R] R Package for Text Manipulation
Gabor Grothendieck
ggrothendieck at gmail.com
Sat Aug 9 15:01:34 CEST 2014
On Sat, Aug 9, 2014 at 8:15 AM, Omar André Gonzáles Díaz
<oma.gonzales at gmail.com> wrote:
> Hi all,
>
> I want to know, where i can find a package to simulate the functions
> "Search and Replace and "Find Words that contain - replace them with...",
> that we can use in EXCEL.
>
> I've look in other places and they say: "Reshape2" by Hadley Wickham. How
> ever, i've investigated it and its not exactly what i'm looking (it's main
> functions are "cast" and "melt", sure you know them).
>
> May you help me please? I want to download data from Google Analytics and
> clean it, what is the best approach?
>
> [[alternative HTML version deleted]]
>
1. The gsubfn function in the gsubfn package can do that. These
commands extract the words and then apply the function represented in
formula notation in the second argument to them:
library(gsubfn) # home page at http://gsubfn.googlecode.com
s <- "The quick brown fox" # test data
# replace the word quick with QUICK
gsubfn("\\S+", ~ if (x == "quick") "QUICK" else x, s)
## [1] "The QUICK brown fox"
# replace words containing o with ?
gsubfn("\\S+", ~ if (grepl("o", x)) "?" else x, s)
## [1] "The quick ? ?"
2. It can also be done without packages:
# replace quick with QUICK
gsub("\\bquick\\b", "QUICK", s)
## [1] "The QUICK brown fox"
# or the following which first split s into a vector of words and
# operate on that pasting it back into a single string at the end
words <- strsplit(s, "\\s+")[[1]]
paste(replace(words, words == "quick", "QUICK"), collapse = " ")
## [1] "The QUICK brown fox"
# replace words containing o with ?. Use `words` from above.
paste(replace(words, grepl("o", words), "?"), collapse = " ")
## [1] "The quick ? ?"
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list