[R] Character (1a, 1b) to numeric

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Fri Jul 10 22:26:37 CEST 2020


Thanks! As I said, cute exercise.

Best,
Bert




On Fri, Jul 10, 2020 at 1:21 PM Fox, John <jfox using mcmaster.ca> wrote:

> Dear Bert,
>
> Wouldn't you know it, but your contribution arrived just after I pressed
> "send" on my last message? So here's how your solution compares:
>
> > microbenchmark(John = John <- xn[x],
> +                Rich = Rich <- xn[match(x, xc)],
> +                Jeff = Jeff <- {
> +                   n <- as.integer( sub( "[a-i]$", "", x ) )
> +                   d <- match( sub( "^\\d+", "", x ), letters[1:9] )
> +                   d[ is.na( d ) ] <- 0
> +                   n + d / 10
> +                },
> +                David = David <- as.numeric(gsub("a", ".3",
> +                                      gsub("b", ".5",
> +                                           gsub("c", ".7", x)))),
> +                Bert = Bert <- {
> +                   nums <- sub("[[:alpha:]]+","",x)
> +                   alph <- sub("\\d+","",x)
> +                   as.numeric(nums) + ifelse(alph == "",0, vals[alph])
> +                },
> +                times=1000L
> +                )
> Unit: microseconds
>   expr       min         lq       mean    median         uq       max
> neval  cld
>   John   261.739   373.9765   599.9411   536.571   569.3750  14489.48
> 1000 a
>   Rich   250.697   372.4450   542.3208   520.383   554.7215  10682.73
> 1000 a
>   Jeff 10879.223 13477.7665 15647.7856 15549.255 17516.7420 146155.28
> 1000  b
>  David 14337.510 18375.0100 20325.8796 20187.174 22161.0195  32575.31
> 1000    d
>   Bert 12344.506 15753.2510 18024.2757 17702.838 19973.0465  32043.80
> 1000   c
> > all.equal(John, Rich)
> [1] TRUE
> > all.equal(John, David)
> [1] "names for target but not for current"
> > all.equal(John, Jeff)
> [1] "names for target but not for current" "Mean relative difference:
> 0.1498243"
> > all.equal(John, Bert)
> [1] "names for target but not for current"
>
> To make the comparison fair, I moved the parts of the solutions that don't
> depend on the length of the data outside the benchmark. Your solution does
> have the virtue of providing the right answer.
>
> Best,
>  John
>
> > On Jul 10, 2020, at 3:54 PM, Bert Gunter <bgunter.4567 using gmail.com> wrote:
> >
> > ... and continuing with this cute little thread...
> >
> > I found the OP's specification a little imprecise -- are your values
> always a string that begins with *some sort" of numeric value followed by
> "some sort" of alpha code? That is, could the numeric value be several
> digits and the alpha code several letters? Probably not, and the existing
> solutions you have been provided are almost certainly all you need. But for
> fun, assuming this more general specification, here is a general way to
> split your alphanumeric codes up into numeric and alpha parts and then
> convert by using a couple of sub() 's.
> >
> > > set.seed(131)
> > > xc <- sample(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), 15,
> replace = TRUE)
> > > nums <- sub("[[:alpha:]]+","",xc)  ## extract numeric part
> > > alph <- sub("\\d+","",xc)   ## extract alpha part
> > > codes <- letters[1:3] ## whatever alpha codes are used
> > > vals <- setNames(c(.3,.5,.7), codes) ## whatever numeric values to
> convert codes to
> > > xnew <- as.numeric(nums) + ifelse(alph == "",0, vals[alph])
> > > data.frame (xc = xc, xnew = xnew)
> >    xc xnew
> > 1  1a  1.3
> > 2   2  2.0
> > 3  1c  1.7
> > 4  1c  1.7
> > 5  1b  1.5
> > 6  1a  1.3
> > 7   2  2.0
> > 8   2  2.0
> > 9  1a  1.3
> > 10 1a  1.3
> > 11 2c  2.7
> > 12 1b  1.5
> > 13 1b  1.5
> > 14  1  1.0
> > 15 1c  1.7
> >
> > Echoing others, no claim for optimality in any sense.
> >
> > Cheers,
> > Bert
> >
> >
> > On Fri, Jul 10, 2020 at 12:28 PM David Carlson <dcarlson using tamu.edu>
> wrote:
> > Here is a different approach:
> >
> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
> > xn
> > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7
> >
> > David L Carlson
> > Professor Emeritus of Anthropology
> > Texas A&M University
> >
> > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <jfox using mcmaster.ca> wrote:
> >
> > > Dear Jean-Louis,
> > >
> > > There must be many ways to do this. Here's one simple way (with no
> claim
> > > of optimality!):
> > >
> > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > >
> > > > set.seed(123) # for reproducibility
> > > > x <- sample(xc, 20, replace=TRUE) # "data"
> > > >
> > > > names(xn) <- xc
> > > > z <- xn[x]
> > > >
> > > > data.frame(z, x)
> > >      z  x
> > > 1  2.5 2b
> > > 2  2.5 2b
> > > 3  1.5 1b
> > > 4  2.3 2a
> > > 5  1.5 1b
> > > 6  1.3 1a
> > > 7  1.3 1a
> > > 8  2.3 2a
> > > 9  1.5 1b
> > > 10 2.0  2
> > > 11 1.7 1c
> > > 12 2.3 2a
> > > 13 2.3 2a
> > > 14 1.0  1
> > > 15 1.3 1a
> > > 16 1.5 1b
> > > 17 2.7 2c
> > > 18 2.0  2
> > > 19 1.5 1b
> > > 20 1.5 1b
> > >
> > > I hope this helps,
> > >  John
> > >
> > >   -----------------------------
> > >   John Fox, Professor Emeritus
> > >   McMaster University
> > >   Hamilton, Ontario, Canada
> > >   Web: http::/socserv.mcmaster.ca/jfox
> > >
> > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <abitbol using sent.com>
> > > wrote:
> > > >
> > > > Dear All
> > > >
> > > > I have a character vector,  representing histology stages, such as
> for
> > > example:
> > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > >
> > > > and this goes on to 3, 3a etc in various order for each patient. I do
> > > have of course a pre-established  classification available which does
> > > change according to the histology criteria under assessment.
> > > >
> > > > I would want to convert xc, for plotting reasons, to a numeric vector
> > > such as
> > > >
> > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > >
> > > > Unfortunately I have no clue on how to do that.
> > > >
> > > > Thanks for any help and apologies if I am missing the obvious way to
> do
> > > it.
> > > >
> > > > JL
> > > > --
> > > > Verif30042020
> > > >
> > > > ______________________________________________
> > > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > >
> > >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > > PLEASE do read the posting guide
> > >
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >
> > >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > PLEASE do read the posting guide
> > >
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list