[R] Character (1a, 1b) to numeric

Jean-Louis Abitbol @b|tbo| @end|ng |rom @ent@com
Fri Jul 10 22:19:05 CEST 2020


Many thanks to all. This help-list is wonderful.

I have used Rich Heiberger solution using match and found something to learn in each answer. 

off topic, I also enjoyed very much his 2008 paper on the graphical presentation of safety data....

Best wishes.


On Fri, Jul 10, 2020, at 10:02 PM, Fox, John wrote:
> Hi,
> 
> We've had several solutions, and I was curious about their relative 
> efficiency. Here's a test with a moderately large data vector:
> 
> > library("microbenchmark")
> > set.seed(123) # for reproducibility
> > x <- sample(xc, 1e4, replace=TRUE) # "data"
> > microbenchmark(John = John <- xn[x], 
> +                Rich = Rich <- xn[match(x, xc)], 
> +                Jeff = Jeff <- {
> +                 n <- as.integer( sub( "[a-i]$", "", x ) )
> +                 d <- match( sub( "^\\d+", "", x ), letters[1:9] )
> +                 d[ is.na( d ) ] <- 0
> +                 n + d / 10
> +                 },
> +                David = David <- as.numeric(gsub("a", ".3", 
> +                                      gsub("b", ".5", 
> +                                           gsub("c", ".7", x)))),
> +                times=1000L
> +                )
> Unit: microseconds
>   expr       min        lq       mean     median         uq       max neval cld
>   John   228.816   345.371   513.5614   503.5965   533.0635  10829.08  1000 a  
>   Rich   217.395   343.035   534.2074   489.0075   518.3260  15388.96  1000 a  
>   Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94  1000  b 
>  David 14256.673 18148.492 20185.7156 20170.3635 22067.6690  34998.95  1000   c
> > all.equal(John, Rich)
> [1] TRUE
> > all.equal(John, David)
> [1] "names for target but not for current"
> > all.equal(John, Jeff)
> [1] "names for target but not for current" "Mean relative difference: 
> 0.1498243" 
> 
> Of course, efficiency isn't the only consideration, and aesthetically 
> (and no doubt subjectively) I prefer Rich Heiberger's solution. OTOH, 
> Jeff's solution is more general in that it generates the correspondence 
> between letters and numbers. The argument for Jeff's solution would, 
> however, be stronger if it gave the desired answer.
> 
> Best,
>  John
> 
> > On Jul 10, 2020, at 3:28 PM, David Carlson <dcarlson using tamu.edu> wrote:
> > 
> > Here is a different approach:
> > 
> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
> > xn
> > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7
> > 
> > David L Carlson
> > Professor Emeritus of Anthropology
> > Texas A&M University
> > 
> > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <jfox using mcmaster.ca> wrote:
> > Dear Jean-Louis,
> > 
> > There must be many ways to do this. Here's one simple way (with no claim of optimality!):
> > 
> > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > 
> > > set.seed(123) # for reproducibility
> > > x <- sample(xc, 20, replace=TRUE) # "data"
> > > 
> > > names(xn) <- xc
> > > z <- xn[x]
> > > 
> > > data.frame(z, x)
> >      z  x
> > 1  2.5 2b
> > 2  2.5 2b
> > 3  1.5 1b
> > 4  2.3 2a
> > 5  1.5 1b
> > 6  1.3 1a
> > 7  1.3 1a
> > 8  2.3 2a
> > 9  1.5 1b
> > 10 2.0  2
> > 11 1.7 1c
> > 12 2.3 2a
> > 13 2.3 2a
> > 14 1.0  1
> > 15 1.3 1a
> > 16 1.5 1b
> > 17 2.7 2c
> > 18 2.0  2
> > 19 1.5 1b
> > 20 1.5 1b
> > 
> > I hope this helps,
> >  John
> > 
> >   -----------------------------
> >   John Fox, Professor Emeritus
> >   McMaster University
> >   Hamilton, Ontario, Canada
> >   Web: http::/socserv.mcmaster.ca/jfox
> > 
> > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <abitbol using sent.com> wrote:
> > > 
> > > Dear All
> > > 
> > > I have a character vector,  representing histology stages, such as for example:
> > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > 
> > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment.
> > > 
> > > I would want to convert xc, for plotting reasons, to a numeric vector such as
> > > 
> > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > 
> > > Unfortunately I have no clue on how to do that.
> > > 
> > > Thanks for any help and apologies if I am missing the obvious way to do it.
> > > 
> > > JL
> > > -- 
> > > Verif30042020
> > > 
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ 
> > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ 
> > > and provide commented, minimal, self-contained, reproducible code.
> > 
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ 
> > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ 
> > and provide commented, minimal, self-contained, reproducible code.
> 
>

-- 
Verif30042020



More information about the R-help mailing list