[R] Extracting numbers from somewhere within strings
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Apr 28 12:17:18 CEST 2004
"Lutz Prechelt" <prechelt at pcpool.mi.fu-berlin.de> writes:
> Hello everybody,
>
> I have a bunch of strings like this:
> "IBM POWER4+ 1.9GHz"
> "IBM RS64-III 500MHz"
> "IBM RS64-IV 600 MHz"
> "IBM RS64 IV 750MHz"
> "Intel Itanium 2 Processor 6M 1.5GHz"
> "Intel Itanium2 1 Ghz"
> "Intel Itanium2 1.5GHz"
> "Intel MP 1.6GHz"
>
> I want to extract the processor speed.
>
> I am using
> grep("MHz", tpc$cpu, ignore.case=T)
> grep("GHz", tpc$cpu, ignore.case=T)
> to extract the unit, because there are only these two.
>
> But how to extract the number before it?
> (I am using R 1.8.0)
>
> In Perl one would match a regexp such as
> /([0-9.]+) ?[MG][Hh][Zz]/
> and then obtain the number as $1.
> But the capability of returning $1 is apparently not
> implemented in grep() or any other function I could find.
>
> How is it best done?
>
> Thanks in advance,
gsub() has \1 etc. For instance
> gsub("^.* ([0-9\\.]+) *[MG][Hh]z$","\\1",x)
[1] "1.9" "500" "600" "750" "1.5" "1" "1.5" "1.6"
(Not exactly trivial to get that right, but neither is it in Perl...)
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list