[R] extract all numbers from a string
Nick Matzke
matzke at berkeley.edu
Mon Jun 17 03:00:49 CEST 2013
Thanks *VERY* much, this is great!
I realized a few more cases, I think I've got something that
covers all the possibilities now:
library(stringr)
tmpstr = "The first number is: 32. Another one is: 32.1.
Here's a number in scientific format, 0.3523e10, and
another, 0.3523e-10, and a negative, -313.1"
patternslist = NULL
p=0
patternslist[[(p=p+1)]] = "(\\d+)" # positive integer
patternslist[[(p=p+1)]] = "(-\\d+)" # negative integer
patternslist[[(p=p+1)]] = "(\\d+\\.\\d+)" # positive float
patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e\\d+)" # positive
float, scientific w. positive power
patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e-\\d+)" # positive
float, scientific w. negative power
patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+)" # negative float
patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e\\d+)" # negative
float, scientific w. positive power
patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e-\\d+)"# negative
float, scientific w. negative power
patternslist[[(p=p+1)]] = "(\\d+e\\d+)" # positive int,
scientific w. positive power
patternslist[[(p=p+1)]] = "(\\d+e-\\d+)" # positive int,
scientific w. negative power
patternslist[[(p=p+1)]] = "(-\\d+e\\d+)" # negative int,
scientific w. positive power
patternslist[[(p=p+1)]] = "(-\\d+e-\\d+)" # negative int,
scientific w. negative power
pattern = paste(patternslist, collapse="|", sep="")
pattern
as.numeric(str_extract_all(tmpstr,pattern)[[1]])
# A more complex string
tmpstr = "The first number is: 32. 342 342.1 -3234e-10
3234e-1 Another one is: 32.1. Here's a number in scientific
format, 0.3523e10, and another, 0.3523e-10, and a negative,
-313.1"
#pattern =
"(\\d)+|(-\\d)+|(\\d+\\.\\d+)|(-\\d+\\.\\d+)|(\\d+.\\d+e\\d+)|(\\d+\\.\\d+e-\\d+)|(-\\d+.\\d+e\\d+)|(-\\d+\\.\\d+e-\\d+)"
as.numeric(str_extract_all(tmpstr,pattern)[[1]])
Cheers!
Nick
PS: A function version:
# Extract numbers / get numbers / get all numbers from a
text string
getnums <- function(tmpstr)
{
# Example string
# tmpstr = "The first number is: 32. 342 342.1 -3234e-10
3234e-1 Another one is: 32.1. Here's a number in
scientific format, 0.3523e10, and another, 0.3523e-10, and a
negative, -313.1"
library(stringr)
# patternslist = NULL
# p=0
# patternslist[[(p=p+1)]] = "(\\d+)" # positive integer
# patternslist[[(p=p+1)]] = "(-\\d+)" # negative integer
# patternslist[[(p=p+1)]] = "(\\d+\\.\\d+)" # positive float
# patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e\\d+)" # positive
float, scientific w. positive power
# patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e-\\d+)" #
positive float, scientific w. negative power
# patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+)" # negative float
# patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e\\d+)" #
negative float, scientific w. positive power
# patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e-\\d+)"#
negative float, scientific w. negative power
#
# patternslist[[(p=p+1)]] = "(\\d+e\\d+)" # positive int,
scientific w. positive power
# patternslist[[(p=p+1)]] = "(\\d+e-\\d+)" # positive
int, scientific w. negative power
# patternslist[[(p=p+1)]] = "(-\\d+e\\d+)" # negative int,
scientific w. positive power
# patternslist[[(p=p+1)]] = "(-\\d+e-\\d+)" # negative
int, scientific w. negative power
#
# pattern = paste(patternslist, collapse="|", sep="")
# set up the pattern
pattern =
"(\\d+)|(-\\d+)|(\\d+\\.\\d+)|(\\d+\\.\\d+e\\d+)|(\\d+\\.\\d+e-\\d+)|(-\\d+\\.\\d+)|(-\\d+\\.\\d+e\\d+)|(-\\d+\\.\\d+e-\\d+)|(\\d+e\\d+)|(\\d+e-\\d+)|(-\\d+e\\d+)|(-\\d+e-\\d+)"
# Get the numbers
nums_from_tmpstr =
as.numeric(str_extract_all(tmpstr,pattern)[[1]])
# Return them
return(nums_from_tmpstr)
}
On 6/15/13 10:46 PM, arun wrote:
>
>
> HI,
> One way would be:
>
> library(stringr)
> tmpstr = "The first number is: 32. Another one is: 32.1.
> Here's a number in scientific format, 0.3523e10, and
> another, 0.3523e-10, and a negative, -313.1"
> pattern<- "(\\d)+|(\\d+\\.\\d+)|(-\\d+\\.\\d+)|(\\d+.\\d+e\\d+)|(\\d+\\.\\d+e-\\d+)"
> str_extract_all(tmpstr,pattern)[[1]]
> #[1] "32" "32.1" "0.3523e10" "0.3523e-10" "-313.1"
> as.numeric(str_extract_all(tmpstr,pattern)[[1]])
> A.K.
>
>
>
> ----- Original Message -----
> From: Nick Matzke <matzke at berkeley.edu>
> To: R-help at r-project.org
> Cc:
> Sent: Sunday, June 16, 2013 1:06 AM
> Subject: [R] extract all numbers from a string
>
> Hi all,
>
> I have been beating my head against this problem for a bit,
> but I can't figure it out.
>
> I have a series of strings of variable length, and each will
> have one or more numbers, of varying format. E.g., I might
> have:
>
>
> tmpstr = "The first number is: 32. Another one is: 32.1.
> Here's a number in scientific format, 0.3523e10, and
> another, 0.3523e-10, and a negative, -313.1"
>
> How could I get R to just give me a list of numerics
> containing the numbers therein?
>
> Thanks very much to the regexp wizards!
>
> Cheers,
> Nick
>
>
>
--
====================================================
Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher
Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley
Graduate Student Instructor, IB200B
Principles of Phylogenetics: Ecology and Evolution
http://ib.berkeley.edu/courses/ib200b/
http://phylo.wikidot.com/
Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page:
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page:
http://fisher.berkeley.edu/cteg/members/matzke.html
Lab phone: 510-643-6299
Dept. fax: 510-643-6264
Cell phone: 510-301-0179
Email: matzke at berkeley.edu
Mailing address:
Department of Integrative Biology
1005 Valley Life Sciences Building #3140
Berkeley, CA 94720-3140
-----------------------------------------------------
"[W]hen people thought the earth was flat, they were wrong.
When people thought the earth was spherical, they were
wrong. But if you think that thinking the earth is spherical
is just as wrong as thinking the earth is flat, then your
view is wronger than both of them put together."
Isaac Asimov (1989). "The Relativity of Wrong." The
Skeptical Inquirer, 14(1), 35-44. Fall 1989.
http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
More information about the R-help
mailing list