[R] Extracting numbers from somewhere within strings

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Apr 28 12:17:18 CEST 2004


"Lutz Prechelt" <prechelt at pcpool.mi.fu-berlin.de> writes:

> Hello everybody,
> 
> I have a bunch of strings like this:
> "IBM POWER4+ 1.9GHz"                  
> "IBM RS64-III 500MHz"              
> "IBM RS64-IV 600 MHz"                 
> "IBM RS64 IV 750MHz"               
> "Intel Itanium 2 Processor 6M 1.5GHz" 
> "Intel Itanium2 1 Ghz"             
> "Intel Itanium2 1.5GHz"               
> "Intel MP 1.6GHz"                   
> 
> I want to extract the processor speed.
> 
> I am using
>   grep("MHz", tpc$cpu, ignore.case=T)
>   grep("GHz", tpc$cpu, ignore.case=T)
> to extract the unit, because there are only these two.
> 
> But how to extract the number before it?
> (I am using R 1.8.0)
> 
> In Perl one would match a regexp such as
>   /([0-9.]+) ?[MG][Hh][Zz]/
> and then obtain the number as $1.
> But the capability of returning $1 is apparently not
> implemented in grep() or any other function I could find.
> 
> How is it best done?
> 
> Thanks in advance,

gsub() has \1 etc. For instance

> gsub("^.* ([0-9\\.]+) *[MG][Hh]z$","\\1",x)
[1] "1.9" "500" "600" "750" "1.5" "1"   "1.5" "1.6"

(Not exactly trivial to get that right, but neither is it in Perl...)

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list