[R] Regular expressions in R

Tue Nov 15 18:47:12 CET 2011

Hi Michael,

Your strings were long so I made a bit smaller example.  Sarah made
one good point, you want to be using gsub() not sub(), but when I use
your code, I do not think it even works precisely for one instance.
Try this on for size, you were 99% there:

## simplified cases
form1 <- c('product + action * mean + CTA + help + mean * product')
form2 <- c('product+action*mean+CTA+help+mean*product')

## what I believe your desired output is
'product + CTA + help'
'product+CTA+help'

gsub("\\s\\+\\s[[:alnum:]]*\\s\\*\\s[[:alnum:]]*", "", form1)
gsub("\\+[[:alnum:]]*\\*[[:alnum:]]*", "", form2)

## your code (using gsub() instead of sub())
gsub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form1)

######## Running on r57586 Windows x64 ########
> gsub("\\s\\+\\s[[:alnum:]]*\\s\\*\\s[[:alnum:]]*", "", form1)
[1] "product + CTA + help"
> gsub("\\+[[:alnum:]]*\\*[[:alnum:]]*", "", form2)
[1] "product+CTA+help"
>
> ## your code (using gsub() instead of sub())
> gsub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form1)
[1] "product ean + CTA + help roduct"

Hope this helps,

Josh

On Tue, Nov 15, 2011 at 9:18 AM, Michael Griffiths
<griffiths at upstreamsystems.com> wrote:
> Good afternoon list,
>
> I have the following character strings; one with spaces between the maths
> operators and variable names, and one without said spaces.
>
> form<-c('~ Sentence + LEGAL + Intro + Intro / Intro1 + Intro * LEGAL +
> benefit + benefit / benefit1 + product + action * mean + CTA + help + mean
> * product')
> form<-c('~Sentence+LEGAL+Intro+Intro/Intro1+Intro*LEGAL+benefit+benefit/benefit1+product+action*mean+CTA+help+mean*product')
>
> I would like to remove the following target strings, either:
>
> 1. '+ Intro * LEGAL' which is  '+ space name space * space name'
> 2. '+Intro*LEGAL' which is  '+ nospace name nospace * nospace name'
>
> Having delved into a variety of sites (e.g.
> http://www.zytrax.com/tech/web/regex.htm#search) investigating regular
> expressions I now have a basic grasp, but I am having difficulties removing
> ALL of the instances or 1. or 2.
>
> The code below removes just a SINGLE instance of the target string, but I
> was expecting it to remove all instances as I have \\*.[[allnum]]. I did
> try \\*.[[allnum]]*, but this did not work.
>
> form<-sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form)
>
> I am obviously still not understanding something. If the list could offer
> some guidance I would be most grateful.
>
> Regards
>
> Mike Griffiths
>
>
>
> --
>
> *Michael Griffiths, Ph.D
> *Statistician
>
> *Upstream Systems*
>
> 8th Floor
> Portland House
> Bressenden Place
> SW1E 5BH
>
> <http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw>
>
> Tel   +44 (0) 20 7869 5147
> Fax  +44 207 290 1321
> Mob +44 789 4944 145
>
> www.upstreamsystems.com<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw>
>
> *griffiths at upstreamsystems.com <einstein at upstreamsystems.com>*
>
> <http://www.upstreamsystems.com/>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/