[R] Regular Expressions
Tony Plate
tplate at blackmesacapital.com
Tue Jul 13 02:11:57 CEST 2004
I'd suggest doing it with multiple regular expressions -- you could
construct a single regular expression for this, but I expect it would get
quite complicated and possibly very slow.
The expression for "y" in the example below tabulates how many words
matched for each line (i.e., line 2 matched 1 word, line 3 matched 3 words,
and line 4 matched 2 words).
> x <- readLines("clipboard", -1)
> x
[1] "Is there a way to use regular expressions to capture two or more words
in a "
[2] "sentence? For example, I wish to to find all the lines that have the
words \"thomas\", "
[3] "\"perl\", and \"program\", such as \"thomas uses a program called
perl\", or \"perl is a "
[4] "program that thomas uses\",
etc."
> sapply(c("perl","program","thomas"), function(re) grep(re, x))
$perl
[1] 3
$program
[1] 3 4
$thomas
[1] 2 3 4
> unlist(sapply(c("perl","program","thomas"), function(re) grep(re, x)),
use.names=F)
[1] 3 3 4 2 3 4
> y <- table(unlist(sapply(c("perl","program","thomas"), function(re)
grep(re, x)), use.names=F))
> y
2 3 4
1 3 2
> which(y>=2)
3 4
2 3
>
hope this helps,
Tony Plate
At Monday 05:59 PM 7/12/2004, Sangick Jeon wrote:
>Hi,
>
>Is there a way to use regular expressions to capture two or more words in a
>sentence? For example, I wish to to find all the lines that have the
>words "thomas",
>"perl", and "program", such as "thomas uses a program called perl", or
>"perl is a
>program that thomas uses", etc.
>
>I'm sure this is a very easy task, I would greatly appreciate any
>help. Thanks!
>
>Sangick
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list