[R] regex - optional part isn't considered in replacement with gsub

Bert Gunter bgunter.4567 at gmail.com
Sun Aug 27 19:10:56 CEST 2017


You may have to provide us more detail on **exactly** the sorts of
patterns you wish to "capture" -- including exactly what you mean by
"capture" (what vaue do you wish to return?) -- as the "obvious"
answer is probably not sufficient:

## using your example -- thankyou

> gsub(".*(49MU6300|LE32S5970).*","\\1",ecommerce[[2]])
[1] "49MU6300"  "LE32S5970"


Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Aug 27, 2017 at 9:18 AM, Omar André Gonzáles Díaz
<oma.gonzales at gmail.com> wrote:
> Hello, I need some help with regex.
>
> I have this to sentences. I need to extract both "49MU6300" and "LE32S5970"
> and put them in a new colum "SKU".
>
> A) SMART TV UHD 49'' CURVO 49MU6300
> B) SMART TV HD 32'' LE32S5970
>
> DataFrame for testing:
>
> ecommerce <- data.frame(a = c(1,2), producto = c("SMART TV UHD 49'' CURVO
> 49MU6300",
>                              "SMART TV HD 32'' LE32S5970"))
>
>
> I'm using gsub like this:
>
> 1.- This would capture A as intended but only "32S5970" from B (missing
> "LE").
>
> ecommerce$sku <- gsub("(.*)([0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2",
> ecommerce$producto)
>
>
> 2.- This would capture "LE32S5970" but not "49MU6300".
>
> ecommerce$sku <-
> gsub("(.*)([a-zA-Z]{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2",
> ecommerce$producto)
>
>
> 3.- If I make the 2 first letter optional with:
>
> ecommerce$sku <-
> gsub("(.*)([a-zA-Z]?{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2",
> ecommerce$producto)
>
>
> "49MU6300" is capture, but again only "32S5970" from B (missing "LE").
>
>
> What should I do? How would you approche it?
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list