[R] extract fixed width fields from a string
Petr Savicky
savicky at cs.cas.cz
Sun Jan 22 23:18:39 CET 2012
On Sun, Jan 22, 2012 at 03:34:12PM -0500, Sam Steingold wrote:
> > * Petr Savicky <fnivpxl at pf.pnf.pm> [2012-01-20 21:59:51 +0100]:
> >
> > Try the following.
> >
> > x <-
> > tolower("ThusThisLongWordWithLettersAndDigitsFrom0to9isAnIntegerBase36")
> > x <- strsplit(x, "")[[1]]
> > digits <- 0:35
> > names(digits) <- c(0:9, letters)
> > y <- digits[x]
> >
> > # solution using gmp package
> > library(gmp)
> > b <- as.bigz(36)
> > sum(y * b^(length(y):1 - 1))
> >
> > [1]
> > "70455190722800243410669999246294410591724807773749367607882253153084991978813070206061584038994
>
> thanks, here is what I wrote:
>
> ## convert a string to an integer in the given base
> digits <- 0:63
> names(digits) <- c(0:9, letters, toupper(letters), "-_")
> string2int <- function (str, base=10) {
> d <- digits[strsplit(str,"")[[1]]]
> sum(d * base^(length(d):1 - 1))
> }
>
> and it appears to work.
> however, I want to be able to apply it to all elements of a vector.
> I can use apply:
>
> > unlist(lapply(c("100","12","213"),string2int))
> [1] 100 12 213
>
> but not directly:
>
> > string2int(c("100","12","213"))
> [1] 100
Hi.
Here, you get the result only for the first string due
to "[[1]]" applied to strsplit(str,"").
As suggested by Michael, a matrix can be used, if
the input is a character vector, whose components
have the same character length (nchar).
strings2int <- function (str, base=10) {
m <- length(str)
n <- unique(nchar(str))
stopifnot(length(n) == 1) # test of all nchar() equal
ch <- strsplit(str, "")
ch <- unlist(ch)
d <- matrix(digits[ch], nrow=m, ncol=n, byrow=TRUE)
c(d %*% base^(n:1 - 1))
}
strings2int(c("100","012","213","453"))
[1] 100 12 213 453
strings2int(c("100","12","213","453"))
Error: length(n) == 1 is not TRUE
Petr.
More information about the R-help
mailing list