[R] extract all numbers from a string
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Jun 17 05:42:44 CEST 2013
On Sun, Jun 16, 2013 at 9:00 PM, Nick Matzke <matzke at berkeley.edu> wrote:
> Thanks *VERY* much, this is great!
>
> I realized a few more cases, I think I've got something that covers all the
> possibilities now:
>
>
>
> library(stringr)
> tmpstr = "The first number is: 32. Another one is: 32.1. Here's a number in
> scientific format, 0.3523e10, and another, 0.3523e-10, and a negative,
> -313.1"
>
> patternslist = NULL
> p=0
> patternslist[[(p=p+1)]] = "(\\d+)" # positive
> integer
> patternslist[[(p=p+1)]] = "(-\\d+)" # negative
> integer
> patternslist[[(p=p+1)]] = "(\\d+\\.\\d+)" # positive float
> patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e\\d+)" # positive float, scientific
> w. positive power
> patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e-\\d+)" # positive float, scientific
> w. negative power
> patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+)" # negative float
> patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e\\d+)" # negative float, scientific
> w. positive power
> patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e-\\d+)"# negative float, scientific
> w. negative power
>
> patternslist[[(p=p+1)]] = "(\\d+e\\d+)" # positive int,
> scientific w. positive power
> patternslist[[(p=p+1)]] = "(\\d+e-\\d+)" # positive int,
> scientific w. negative power
> patternslist[[(p=p+1)]] = "(-\\d+e\\d+)" # negative int,
> scientific w. positive power
> patternslist[[(p=p+1)]] = "(-\\d+e-\\d+)" # negative int,
> scientific w. negative power
>
> pattern = paste(patternslist, collapse="|", sep="")
> pattern
> as.numeric(str_extract_all(tmpstr,pattern)[[1]])
>
> # A more complex string
> tmpstr = "The first number is: 32. 342 342.1 -3234e-10 3234e-1 Another
> one is: 32.1. Here's a number in scientific format, 0.3523e10, and another,
> 0.3523e-10, and a negative, -313.1"
> #pattern =
> "(\\d)+|(-\\d)+|(\\d+\\.\\d+)|(-\\d+\\.\\d+)|(\\d+.\\d+e\\d+)|(\\d+\\.\\d+e-\\d+)|(-\\d+.\\d+e\\d+)|(-\\d+\\.\\d+e-\\d+)"
> as.numeric(str_extract_all(tmpstr,pattern)[[1]])
This much simpler single pattern may be good enough:
> library(gsubfn)
> pat <- "[-+.e0-9]*\\d"
> strapplyc(tmpstr, pat)[[1]]
[1] "32" "342" "342.1" "-3234e-10" "3234e-1"
[6] "32.1" "0.3523e10" "0.3523e-10" "-313.1"
> strapply(tmpstr, pat, as.numeric)[[1]]
[1] 3.200e+01 3.420e+02 3.421e+02 -3.234e-07 3.234e+02 3.210e+01 3.523e+09
[8] 3.523e-11 -3.131e+02
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list