[R] String processing - is there a better way

Gabor Grothendieck ggrothendieck at gmail.com
Wed Jul 21 19:41:38 CEST 2010


On Wed, Jul 21, 2010 at 1:02 PM, Davis, Brian <Brian.Davis at uth.tmc.edu> wrote:
> I have a two part question
>
> Part 1)
> I am trying to remove characters in a string based on the position of a key character in another string.  I have a solution that works but it requires a for-loop.  A vectorized way of doing this has alluded me.
>
> CleanRead<-function(x,y) {
>
>   if (!is.character(x))
>     x <- as.character(x)
>   if (!is.character(y))
>     y <- as.character(y)
>
>   idx<-grep("\\*", x, value=FALSE)
>   starpos<-gregexpr("\\*", x[idx])
>
>   ysplit<-strsplit(y[idx], '')
>   n<-length(idx)
>   for(i in 1:n) {
>     ysplit[[i]][starpos[[i]]] = ""
>   }
>
>   y[idx]<-unlist(lapply(ysplit, paste, sep='', collapse=''))
>   return(y)
> }
>
> x<-c("AA*.*A,,,", "**a.a*,,,A", "C*c..", "**aA")
> y<-c("abcdefghi", "abcdefghij", "abcde", "abcd")
>
> CleanRead(x,y)
> [1] "abdfghi" "cdeghij" "acde"    "cd"
>
>
> Is there a better way to do this?
>
> Part 2)
> My next step in the string processing is to take the characters in the output of CleanRead and subtract 33 from the ascii value of the character to obtain an integer. Again I have a solution that works, involving splitting the string into characters then converting them to factors (starting at ascii 34) and using unclass to get the integer value. (kindof a atoi(x)-33 all in one step)
>
> I looked for the C equivalent of atoi, but the only help I could find (R-help 2003) suggested using as.numeric.  However, the help file (and testing) shows you get 'NA'.
>

This splits x and y into vectors of single characters, extracts those
from y for which x is not * and then matches the result to letters to
return a number.

f <- function(x, y) match(y[x != "*"], letters)
mapply(f, strsplit(x, ""), strsplit(y, ""))



More information about the R-help mailing list