[R] regex - optional part isn't considered in replacement with gsub
Omar André Gonzáles Díaz
oma.gonzales at gmail.com
Sun Aug 27 18:18:52 CEST 2017
Hello, I need some help with regex.
I have this to sentences. I need to extract both "49MU6300" and "LE32S5970"
and put them in a new colum "SKU".
A) SMART TV UHD 49'' CURVO 49MU6300
B) SMART TV HD 32'' LE32S5970
DataFrame for testing:
ecommerce <- data.frame(a = c(1,2), producto = c("SMART TV UHD 49'' CURVO
49MU6300",
"SMART TV HD 32'' LE32S5970"))
I'm using gsub like this:
1.- This would capture A as intended but only "32S5970" from B (missing
"LE").
ecommerce$sku <- gsub("(.*)([0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2",
ecommerce$producto)
2.- This would capture "LE32S5970" but not "49MU6300".
ecommerce$sku <-
gsub("(.*)([a-zA-Z]{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2",
ecommerce$producto)
3.- If I make the 2 first letter optional with:
ecommerce$sku <-
gsub("(.*)([a-zA-Z]?{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2",
ecommerce$producto)
"49MU6300" is capture, but again only "32S5970" from B (missing "LE").
What should I do? How would you approche it?
[[alternative HTML version deleted]]
More information about the R-help
mailing list