[R] extract all numbers from a string

Nick Matzke matzke at berkeley.edu
Mon Jun 17 08:31:20 CEST 2013


Ooh, nice! Thanks!
Nick


On 6/16/13 8:42 PM, Gabor Grothendieck wrote:
> On Sun, Jun 16, 2013 at 9:00 PM, Nick Matzke <matzke at berkeley.edu> wrote:
>> Thanks *VERY* much, this is great!
>>
>> I realized a few more cases, I think I've got something that covers all the
>> possibilities now:
>>
>>
>>
>> library(stringr)
>> tmpstr = "The first number is: 32.  Another one is: 32.1. Here's a number in
>> scientific format, 0.3523e10, and another, 0.3523e-10, and a negative,
>> -313.1"
>>
>> patternslist = NULL
>> p=0
>> patternslist[[(p=p+1)]] = "(\\d+)"                              # positive
>> integer
>> patternslist[[(p=p+1)]] = "(-\\d+)"                             # negative
>> integer
>> patternslist[[(p=p+1)]] = "(\\d+\\.\\d+)"               # positive float
>> patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e\\d+)"  # positive float, scientific
>> w. positive power
>> patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e-\\d+)" # positive float, scientific
>> w. negative power
>> patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+)"              # negative float
>> patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e\\d+)" # negative float, scientific
>> w. positive power
>> patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e-\\d+)"# negative float, scientific
>> w. negative power
>>
>> patternslist[[(p=p+1)]] = "(\\d+e\\d+)"                 # positive int,
>> scientific w. positive power
>> patternslist[[(p=p+1)]] = "(\\d+e-\\d+)"                # positive int,
>> scientific w. negative power
>> patternslist[[(p=p+1)]] = "(-\\d+e\\d+)"                # negative int,
>> scientific w. positive power
>> patternslist[[(p=p+1)]] = "(-\\d+e-\\d+)"               # negative int,
>> scientific w. negative power
>>
>> pattern = paste(patternslist, collapse="|", sep="")
>> pattern
>> as.numeric(str_extract_all(tmpstr,pattern)[[1]])
>>
>> # A more complex string
>> tmpstr = "The first number is: 32.  342 342.1   -3234e-10 3234e-1 Another
>> one is: 32.1. Here's a number in scientific format, 0.3523e10, and another,
>> 0.3523e-10, and a negative, -313.1"
>> #pattern =
>> "(\\d)+|(-\\d)+|(\\d+\\.\\d+)|(-\\d+\\.\\d+)|(\\d+.\\d+e\\d+)|(\\d+\\.\\d+e-\\d+)|(-\\d+.\\d+e\\d+)|(-\\d+\\.\\d+e-\\d+)"
>> as.numeric(str_extract_all(tmpstr,pattern)[[1]])
>
> This much simpler single pattern may be good enough:
>
>> library(gsubfn)
>> pat <- "[-+.e0-9]*\\d"
>> strapplyc(tmpstr, pat)[[1]]
> [1] "32"         "342"        "342.1"      "-3234e-10"  "3234e-1"
> [6] "32.1"       "0.3523e10"  "0.3523e-10" "-313.1"
>> strapply(tmpstr, pat, as.numeric)[[1]]
> [1]  3.200e+01  3.420e+02  3.421e+02 -3.234e-07  3.234e+02  3.210e+01  3.523e+09
> [8]  3.523e-11 -3.131e+02
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

-- 
====================================================
Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher

Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Graduate Student Instructor, IB200B
Principles of Phylogenetics: Ecology and Evolution
http://ib.berkeley.edu/courses/ib200b/
http://phylo.wikidot.com/


Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page: 
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page: 
http://fisher.berkeley.edu/cteg/members/matzke.html
Lab phone: 510-643-6299
Dept. fax: 510-643-6264

Cell phone: 510-301-0179
Email: matzke at berkeley.edu

Mailing address:
Department of Integrative Biology
1005 Valley Life Sciences Building #3140
Berkeley, CA 94720-3140

-----------------------------------------------------
"[W]hen people thought the earth was flat, they were wrong. 
When people thought the earth was spherical, they were 
wrong. But if you think that thinking the earth is spherical 
is just as wrong as thinking the earth is flat, then your 
view is wronger than both of them put together."

Isaac Asimov (1989). "The Relativity of Wrong." The 
Skeptical Inquirer, 14(1), 35-44. Fall 1989.
http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm



More information about the R-help mailing list