[R] extract fixed width fields from a string

Sun Jan 22 23:18:39 CET 2012

On Sun, Jan 22, 2012 at 03:34:12PM -0500, Sam Steingold wrote:
> > * Petr Savicky <fnivpxl at pf.pnf.pm> [2012-01-20 21:59:51 +0100]:
> >
> > Try the following.
> >
> >   x <-
> > tolower("ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36")
> >   x <- strsplit(x, "")[[1]]
> >   digits <- 0:35
> >   names(digits) <- c(0:9, letters)
> >   y <- digits[x]
> >  
> >   # solution using gmp package
> >   library(gmp)
> >   b <- as.bigz(36)
> >   sum(y * b^(length(y):1 - 1))
> >  
> >   [1]
> > "70455190722800243410669999246294410591724807773749367607882253153084991978813070206061584038994
> 
> thanks, here is what I wrote:
> 
> ## convert a string to an integer in the given base
> digits <- 0:63
> names(digits) <- c(0:9, letters, toupper(letters), "-_")
> string2int <- function (str, base=10) {
>   d <- digits[strsplit(str,"")[[1]]]
>   sum(d * base^(length(d):1 - 1))
> }
> 
> and it appears to work.
> however, I want to be able to apply it to all elements of a vector.
> I can use apply:
> 
> > unlist(lapply(c("100","12","213"),string2int))
> [1] 100  12 213
> 
> but not directly:
> 
> > string2int(c("100","12","213"))
> [1] 100

Hi.

Here, you get the result only for the first string due
to "[[1]]" applied to strsplit(str,"").

As suggested by Michael, a matrix can be used, if
the input is a character vector, whose components
have the same character length (nchar).

  strings2int <- function (str, base=10) {
    m <- length(str)
    n <- unique(nchar(str))
    stopifnot(length(n) == 1) # test of all nchar() equal
    ch <- strsplit(str, "")
    ch <- unlist(ch)
    d <- matrix(digits[ch], nrow=m, ncol=n, byrow=TRUE)
    c(d %*% base^(n:1 - 1))
  }

  strings2int(c("100","012","213","453"))

  [1] 100  12 213 453

  strings2int(c("100","12","213","453"))

  Error: length(n) == 1 is not TRUE

Petr.