[R] Need some help with regular expression
Steven Nagy
nstefi at gmail.com
Sun Nov 20 05:06:36 CET 2016
I tried out a regular expression on this website:
http://regexr.com/3en1m
So the input text is:
"Name.MEMBER_TYPE: -> STU"
The regular expression is: ((?:\w+|\s) -> STU|STU -> (?:\w+|\s))
And it returns:
" -> STU"
but when I use in R, it doesn't return the same result:
strapply(c, "((?:\\w+|\\s) -> STU|STU -> (?:\\w+|\\s))", c, backref = -1,
perl = TRUE)
returns:
"Name.MEMBER_TYPE: -> STU"
Here is what I was trying to do:
I need to extract some values from a log table, and I created a regular
expression that helps me with that.
The log table has cells with values like:
a = "Name.MEMBER_TYPE: NMA -> STU ; CATEGORY: -> 1 ; CITY: MISSISSAUGA ->
Mississauga ; ZIP: L5N1H9 -> L5N 1H9 ; COUNTRY: CAN -> ; MEMBER_STATUS: ->
N"
or
b = "Name.MEMBER_TYPE: STU -> REG ; CATEGORY: 1 ->"
so I needed to extract the values that a STU member type is changing from
and to, so I needed NMA, STU in the 1st case or STU, REG in the 2nd case.
I came up with this expression which worked in both cases:
strapply(strapply(a, "(\\w+ -> STU|STU -> \\w+)", c, backref = -1, perl =
TRUE), "(\\w+) -> (\\w+)", c, backref = -2, perl = TRUE)
But I had a 3rd case when the source member type was blank:
c = "Name.MEMBER_TYPE: -> STU"
and in that case it returned an error:
strapply(strapply(c, "(\\w+ -> STU|STU -> \\w+)", c, backref = -1, perl =
TRUE), "(\\w+) -> (\\w+)", c, backref = -2, perl = TRUE)
Error: is.character(x) is not TRUE
I found that the error is because this returns NULL:
strapply(c, "(\\w+ -> STU|STU -> \\w+)", c, backref = -1, perl = TRUE)
So I tried to modify the regular expression to match any word or blank
space:
strapply(c, "((?:\\w+|\\s) -> STU|STU -> (?:\\w+|\\s))", c, backref = -1,
perl = TRUE)
but this returned me the whole value of "c":
"Name.MEMBER_TYPE: -> STU"
and I only needed " -> STU" as it shows on the website regxr.com
Is the result wrong on the regxr.com website or strapply returns the wrong
result?
Thanks,
Steven
[[alternative HTML version deleted]]
More information about the R-help
mailing list