[R] regular expression strikes again

Tue Jul 9 13:50:14 CEST 2013

On Jul 9, 2013, at 12:19 , PIKAL Petr wrote:

> Thanks, it works to some extent. 
> 
> The test comes from some file which is not filled propperly. If I use your suggestion I get correct values for those 2 digit numbers before "," but I get some other values which do not have space before numbers.
> 
>> dput(test[c(1:10,500:510)])
> c("Cl Tio2 ph 5,8 1", "Cl Tio2 ph 5,8 2", "Cl Tio2 ph 5,8 3", 
> "pH5,57 1", "pH5,57 2", "pH5,57 3", "pH4,8 1", "pH4,8 2", "pH4,8 3", 
> "pH4,12 1", "pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", 
> "pH 9,66 3", "pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1", 
> "RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3")
> 
>> gsub("^.* ([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[c(1:10,500:510)])
> [1] "5,8"      "5,8"      "5,8"      "pH5,57 1" "pH5,57 2" "pH5,57 3"
> [7] "pH4,8 1"  "pH4,8 2"  "pH4,8 3"  "pH4,12 1" "9,36"     "9,36"    
> [13] "9,66"     "9,66"     "9,66"     "10,04"    "10,04"    "10,04"   
> [19] "6,13"     "6,13"     "6,13"    
>> 
> 
> Basically I would like to get one or two digits before comma and two digits after comma.

Then maybe

> gsub("^.*[^[:digit:]]([[:digit:]]+,[[:digit:]]*).*$", "\\1", x)
 [1] "5,8"   "5,8"   "5,8"   "5,57"  "5,57"  "5,57"  "4,8"   "4,8"   "4,8"  
[10] "4,12"  "9,36"  "9,36"  "9,66"  "9,66"  "9,66"  "10,04" "10,04" "10,04"
[19] "6,13"  "6,13"  "6,13" 

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com