[R] regular expressions : extracting numbers

Marc Schwartz marc_schwartz at comcast.net
Mon Jul 30 14:17:43 CEST 2007


On Mon, 2007-07-30 at 13:58 +0200, GOUACHE David wrote:
> Hello all,
> 
> I have a vector of character strings, in which I have letters,
> numbers, and symbols. What I wish to do is obtain a vector of the same
> length with just the numbers.
> A quick example -
> 
> extract of the original vector :
> "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb"
> "rb" "rb 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb"
> 
> and the type of thing I wish to end up with :
> "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" ""
> 
> or, instead of "", NA would be acceptable (actually it would almost be
> better for me)
> 
> Anyways, I've been battling with gsub() and things of the sort, but
> I'm drowning in the regular expressions, despite a few hours of
> looking at Perl tutorials...
> So if anyone can help me out, it would be greatly appreciated!!
> 
> In advance, thanks very much.

Try this:

> Vec
 [1] "lema, rb 2%"   "rb 2%"         "rb 3%"         "rb 4%"        
 [5] "rb 3%"         "rb 2%,mineuse" "rb"            "rb"           
 [9] "rb 12"         "rb"            "rj 30%"        "rb"           
[13] "rb"            "rb 25%"        "rb"            "rb"           
[17] "rb"            "rj, rb" 

> gsub("[^0-9]", "", Vec)
 [1] "2"  "2"  "3"  "4"  "3"  "2"  ""   ""   "12" ""   "30" ""   ""  
[14] "25" ""   ""   ""   ""  


The search pattern regex here is [^0-9] which says to replace anything
that is not (^) in the character range of 0 through 9.

See ?regex and/or http://www.regular-expressions.info/

HTH,

Marc Schwartz



More information about the R-help mailing list