[R] regex - extracting 2 numbers and " from strings

Omar André Gonzáles Díaz oma.gonzales at gmail.com
Fri Oct 9 20:53:14 CEST 2015


Yes, you are right. Thank you.

2015-10-08 20:07 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:

>
> On Oct 8, 2015, at 4:50 PM, Omar André Gonzáles Díaz wrote:
>
> > David, it does work but not in all cases:
>
> It should work if you change the "+" to  "*" in the last capture class. It
> makes trailing non-digit characters entirely optional.
>
> > sub("(^.+ )(\\d+)([\"]|[']{2})(.*$)", "\\2\\3", b)
>  [1] "40''" "40''" "49\"" "49\"" "28\"" "40\"" "32''" "32''" "40\"" "55\""
> [11] "40\"" "24\"" "42''" "50\"" "48\"" "48\"" "48\"" "48''" "50\"" "50''"
> [21] "50\"" "55\"" "55''" "55\"" "55''" "55\"" "65''" "65\"" "65''" "75\""
>
>
> Moral of the story: Always post an example with the necessary complexity.
> >
> > This is now my b vector, after your solution:
> >
> > b <- c("40''", "40''", "49\"", "49\"", "HAIER TELEVISOR LED LE28F6600
> 28\"",
> > "40\"", "32''", "32''", "40\"", "55\"", "HAIER TV LED LE40B8000 FULL HD
> 40\"",
> > "24\"", "42''", "HAIER TELEVISOR LED LE50K5000N 50\"", "48\"",
> > "48\"", "48\"", "48''", "50\"", "50''", "50\"", "55\"", "55''",
> > "55\"", "55''", "55\"", "65''", "SAMSUNG SMART TV 65JU6500 LED UHD 65\"",
> > "65''", "75\"")
> >
> > 2015-10-08 18:14 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
> >
> > On Oct 8, 2015, at 3:45 PM, Omar André Gonzáles Díaz wrote:
> >
> > > Hi I have a vector of 100 elementos like this ones:
> > >
> > > a <- c("SMART TV LCD FHD 70\" LC70LE660", "LED FULL HD 58'' LE58D3140")
> > >
> > > I want to put just the (70\") and (58'') in a vector b.
> >
> > > sub("(^.+ )(\\d+)([\"]|[']{2})(.+$)", "\\2\\3", a)
> > [1] "70\"" "58''"
> >
> > Also. The `stringr` package uses the code in the `stringi` package to
> give more compact expressions. You might want to look at
> >
> > str_extract     Extract matching patterns from a string.
> > str_extract_all Extract matching patterns from a string.
> >
> >
> > >
> > > This is my try, but is not working:
> > >
> > > b <- grepl('^[0-9]{2}""$',a)
> > >
> > > Any hint is welcome, thanks.
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> >
>
> David Winsemius
> Alameda, CA, USA
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list