[R] regular expression strikes again

PIKAL Petr petr.pikal at precheza.cz
Tue Jul 9 12:19:23 CEST 2013


Thanks, it works to some extent. 

The test comes from some file which is not filled propperly. If I use your suggestion I get correct values for those 2 digit numbers before "," but I get some other values which do not have space before numbers.

> dput(test[c(1:10,500:510)])
c("Cl Tio2 ph 5,8 1", "Cl Tio2 ph 5,8 2", "Cl Tio2 ph 5,8 3", 
"pH5,57 1", "pH5,57 2", "pH5,57 3", "pH4,8 1", "pH4,8 2", "pH4,8 3", 
"pH4,12 1", "pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", 
"pH 9,66 3", "pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1", 
"RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3")

> gsub("^.* ([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[c(1:10,500:510)])
 [1] "5,8"      "5,8"      "5,8"      "pH5,57 1" "pH5,57 2" "pH5,57 3"
 [7] "pH4,8 1"  "pH4,8 2"  "pH4,8 3"  "pH4,12 1" "9,36"     "9,36"    
[13] "9,66"     "9,66"     "9,66"     "10,04"    "10,04"    "10,04"   
[19] "6,13"     "6,13"     "6,13"    
>

Basically I would like to get one or two digits before comma and two digits after comma.

Thanks anyway
Petr

> -----Original Message-----
> From: peter dalgaard [mailto:pdalgd at gmail.com]
> Sent: Tuesday, July 09, 2013 11:59 AM
> To: PIKAL Petr
> Cc: r-help
> Subject: Re: [R] regular expression strikes again
> 
> 
> On Jul 9, 2013, at 11:45 , PIKAL Petr wrote:
> 
> > Dear experts in regexpr.
> >
> > I have this
> >
> > dput(test[500:510])
> > c("pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", "pH 9,66 3",
> "pH
> > 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1", "RGLP
> > 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3")
> >
> > and I want something like this
> >
> > gsub("^.*([[:digit:]],[[:digit:]]*).*$", "\\1", test[500:510]) [1]
> > "9,36" "9,36" "9,66" "9,66" "9,66" "0,04" "0,04" "0,04" "6,13" "6,13"
> > [11] "6,13"
> >
> > but with 10,04 values instead of 0,04.
> >
> > I tried
> > gsub("^.*([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[500:510])
> >
> > or other variations but without any success.
> >
> 
> 
> Presumably the ^.* is too greedy. Perhaps add a space? I.e.,
> 
> gsub("^.* ([[:di......
> 
> 
> --
> Peter Dalgaard, Professor
> Center for Statistics, Copenhagen Business School Solbjerg Plads 3,
> 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list