[R] Cut intervals (character) to numeric midpoint; regex problem

David Winsemius dwinsemius at comcast.net
Tue Dec 1 20:41:40 CET 2009


Starting with the head of a 499 element matrix whose column names are  
now the labels trom a cut() operation, I needed to get to a vector of  
midpoints to serve as the basis for plotting a calibration curve  
( exp(linear predictor) vs.  :

 > dput(head(dimnames(mtcal)[2][[1]])) # was starting point


testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]", "(-2.876,-2.756]",  
"(-2.756,-2.668]",
"(-2.668,-2.597]", "(-2.597,-2.539]")

I started this message with the thought of requesting an answer but  
kept asking myself if I really had check the docs and tested my  
understanding. I eventually solved it using the gsubfn from the gsubfn  
package:

testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),
(-?[[:digit:]]+.?[[:digit:]]*)\\]",
~ (as.numeric(x)+as.numeric(y))/2,  testvec))

# I did discover that carriage returns in the middle of the pattern  
will not give desired results, so if this is broken by your mail- 
client, be sure to rejoin in the console.

The extra "?"'s after the decimal point are in there because I had 4  
NA's around the median linear predictor:

 > dimnames(mtcal)[2][[1]][which(is.na(testintvl))]
[1] "(-1.008,-1]"  "(-1,-0.9922]" "(0.9914,1]"   "(1,1.009]"

So a better test vector would be:

testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]", "(-2.876,-2.756]",  
"(-2.756,-2.668]",
"(-2.668,-2.597]", "(-2.597,-2.539]", "(-1.008,-1]",  "(-1,-0.9922]",  
"(0.9914,1]", "(1,1.009]" )

 > testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),(-? 
[[:digit:]]+.?[[:digit:]]*)\\]",
+ ~ (as.numeric(x)+as.numeric(y))/2,  testvec))

 > testintvl
  [1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040 -0.9961   
0.9957  1.0045

I offer this to those who may feel regex challenged (as I often do).  
The gsubfn function is pretty slick. I don't see an author listed for  
the function, but the author of the package documents is Gabor  
Grothendieck.

--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list