[R] Cut intervals (character) to numeric midpoint; regex problem

David Winsemius dwinsemius at comcast.net
Tue Dec 1 21:14:08 CET 2009


I'm sitting here chuckling. Your solution is just so "pure".

I would offer an enhancement. When I tested with my cuts that had "-"  
before the digits, you solution dropped them, so my suggestion for the  
pattern would be:   "[-[:digit:].]+"

I will admit that I thought it might fail with positive numbers but it  
does not seem to:

 > interv <- strapply(testvec, "[-[:digit:].]+", as.numeric, simplify  
= TRUE)
 > interv
        [,1]     [,2]   [,3]   [,4]   [,5]   [,6]   [,7]    [,8]   [, 
9] [,10]
[1,] -8.616   -3.084 -2.876 -2.756 -2.668 -2.597 -1.008 -1.0000 0.9914  
1.000
[2,] -3.084   -2.876 -2.756 -2.668 -2.597 -2.539 -1.000 -0.9922 1.0000  
1.009

I was not able to get that pattern to give acceptable results in  
gsubfn, so I obviously need to study this more closely.

-- 
David.

On Dec 1, 2009, at 2:47 PM, Gabor Grothendieck wrote:

> You also might want to look at
>
> demo("gsubfn-cut")
>
>
> On Tue, Dec 1, 2009 at 2:41 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
> Starting with the head of a 499 element matrix whose column names  
> are now the labels trom a cut() operation, I needed to get to a  
> vector of midpoints to serve as the basis for plotting a calibration  
> curve ( exp(linear predictor) vs.  :
>
> > dput(head(dimnames(mtcal)[2][[1]])) # was starting point
>
>
> testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]",  
> "(-2.876,-2.756]", "(-2.756,-2.668]",
> "(-2.668,-2.597]", "(-2.597,-2.539]")
>
> I started this message with the thought of requesting an answer but  
> kept asking myself if I really had check the docs and tested my  
> understanding. I eventually solved it using the gsubfn from the  
> gsubfn package:
>
> testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),
> (-?[[:digit:]]+.?[[:digit:]]*)\\]",
> ~ (as.numeric(x)+as.numeric(y))/2,  testvec))
>
> # I did discover that carriage returns in the middle of the pattern  
> will not give desired results, so if this is broken by your mail- 
> client, be sure to rejoin in the console.
>
> The extra "?"'s after the decimal point are in there because I had 4  
> NA's around the median linear predictor:
>
> > dimnames(mtcal)[2][[1]][which(is.na(testintvl))]
> [1] "(-1.008,-1]"  "(-1,-0.9922]" "(0.9914,1]"   "(1,1.009]"
>
> So a better test vector would be:
>
> testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]",  
> "(-2.876,-2.756]", "(-2.756,-2.668]",
> "(-2.668,-2.597]", "(-2.597,-2.539]", "(-1.008,-1]",   
> "(-1,-0.9922]", "(0.9914,1]", "(1,1.009]" )
>
> > testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*), 
> (-?[[:digit:]]+.?[[:digit:]]*)\\]",
> + ~ (as.numeric(x)+as.numeric(y))/2,  testvec))
>
> > testintvl
>  [1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040  
> -0.9961  0.9957  1.0045
>
> I offer this to those who may feel regex challenged (as I often do).  
> The gsubfn function is pretty slick. I don't see an author listed  
> for the function, but the author of the package documents is Gabor  
> Grothendieck.
>
> --
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list