[R] strapply and characters adjacent to the matched pattern

mdvaan mathijsdevaan at gmail.com
Wed Jul 25 22:34:44 CEST 2012


Thanks Gabor. That worked really well. I have been reading about the use of
POSIX and regular expressions and I tried to use your example to see if I
could  ignore all matches in which the character preceding (rather than
following) the match is one of [:alpha:]? So far, I have been unsuccessful.
Could anyone help me out here or direct me to a source that explains the
combined use of POSIX and regular expressions? Thanks!

require(gsubfn)
# this only checks for the characters following the match and therefore
matches also matches the third element
# however I want it to match only the 2nd, 5th and 6th elements
strapply(c("abc", "ab", "abdef", "defc", "def", " def "),
"(def|ab)($|[^[[:alpha:]])")

The outcome should look like this:
[[1]]
NULL

[[2]]
[1] "ab"

[[3]]
NULL

[[4]]
NULL

[[5]]
[1] "def"

[[6]]
[1] "def"



Gabor Grothendieck wrote
> 
> On Tue, Jul 24, 2012 at 5:06 PM, mdvaan <mathijsdevaan@> wrote:
>> Hi,
>>
>> In the example below, one of the searched patterns "SE" is matched in the
>> word "second". I would like to ignore all matches in which the character
>> following the match is one of [:alpha:]. How do I do this without
>> removing
>> the "ignore.case = T" argument of the strapply function? Thank you very
>> much!
>>
>> # load library
>> require(gsubfn)
>> # read in data
>> data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE")
>> # define the object to be searched
>> text <- c("the first is Santa Fe Gold Corp", "the second is Starpharma
>> Holdings")
>> # match
>> strapply(text, data, ignore.case = T)
>>
>> The preferred outcome would be:
>>
>> [[1]]
>> [1] "Santa Fe Gold Corp"
>>
>> [[2]]
>> [1] "Starpharma Holdings"
>>
>> instead of:
>>
>> [[1]]
>> [1] "Santa Fe Gold Corp"
>>
>> [[2]]
>> [1] "se"                  "Starpharma Holdings"
>>
>>
> 
> Try this:
> 
>> strapply(c("abc", "ab", "ab def"), "(ab|d)($|[^[[:alpha:]])")
> [[1]]
> NULL
> 
> [[2]]
> [1] "ab"
> 
> [[3]]
> [1] "ab"
> 
> 
> -- 
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



--
View this message in context: http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673p4637835.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list