[R] regex - optional part isn't considered in replacement with gsub

Omar André Gonzáles Díaz oma.gonzales at gmail.com
Sun Aug 27 18:18:52 CEST 2017


Hello, I need some help with regex.

I have this to sentences. I need to extract both "49MU6300" and "LE32S5970"
and put them in a new colum "SKU".

A) SMART TV UHD 49'' CURVO 49MU6300
B) SMART TV HD 32'' LE32S5970

DataFrame for testing:

ecommerce <- data.frame(a = c(1,2), producto = c("SMART TV UHD 49'' CURVO
49MU6300",
                             "SMART TV HD 32'' LE32S5970"))


I'm using gsub like this:

1.- This would capture A as intended but only "32S5970" from B (missing
"LE").

ecommerce$sku <- gsub("(.*)([0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2",
ecommerce$producto)


2.- This would capture "LE32S5970" but not "49MU6300".

ecommerce$sku <-
gsub("(.*)([a-zA-Z]{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2",
ecommerce$producto)


3.- If I make the 2 first letter optional with:

ecommerce$sku <-
gsub("(.*)([a-zA-Z]?{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2",
ecommerce$producto)


"49MU6300" is capture, but again only "32S5970" from B (missing "LE").


What should I do? How would you approche it?

	[[alternative HTML version deleted]]



More information about the R-help mailing list