[R] test if elements of a character vector contain letters
Rui Barradas
ruipbarradas at sapo.pt
Mon Aug 6 18:51:04 CEST 2012
Hello,
Fun as an exercise in vectorization. 30 times faster. Don't look, guess.
Gave it up? Ok, here it is.
is_letter <- function(x, pattern=c(letters, LETTERS)){
sapply(x, function(y){
any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
})
}
# test ascii codes, just one loop.
has_letter <- function(x){
sapply(x, function(y){
y <- as.integer(charToRaw(y))
any((65 <= y & y <= 90) | (97 <= y & y <= 122))
})
}
x <- c(letters, 1:26)
x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')
x <- rep(x, 1e3)
t1 <- system.time(is_letter(x))
t2 <- system.time(has_letter(x))
rbind(t1, t2, t1/t2)
user.self sys.self elapsed user.child sys.child
t1 15.69 0 15.74 NA NA
t2 0.50 0 0.50 NA NA
31.38 NaN 31.48 NA NA
Em 06-08-2012 17:25, Liviu Andronic escreveu:
> Dear all
> I'm pretty sure that I'm approaching the problem in a wrong way.
> Suppose the following character vector:
>> (x[1:10] <- paste(x[1:10], sample(1:10, 10), sep=''))
> [1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4"
>> x
> [1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" "k"
> "l" "m" "n"
> [15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y"
> "z" "1" "2"
> [29] "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
> "14" "15" "16"
> [43] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
>
>
> How do you test whether the elements of the vector contain at least
> one letter (or at least one digit) and obtain a logical vector of the
> same dimension? I came up with the following awkward function:
> is_letter <- function(x, pattern=c(letters, LETTERS)){
> sapply(x, function(y){
> any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
> })
> }
>
>> is_letter(x)
> a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k
> l m n o
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE
> p q r s t u v w x y z
> 1 2 3 4
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> FALSE FALSE FALSE FALSE
> 5 6 7 8 9 10 11 12 13 14 15
> 16 17 18 19
> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> FALSE FALSE FALSE FALSE
> 20 21 22 23 24 25 26
> FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>> is_letter(x, 0:9) ##function slightly misnamed
> a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k
> l m n o
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
> FALSE FALSE FALSE FALSE
> p q r s t u v w x y z
> 1 2 3 4
> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> TRUE TRUE TRUE TRUE
> 5 6 7 8 9 10 11 12 13 14 15
> 16 17 18 19
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE
> 20 21 22 23 24 25 26
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE
>
>
> Is there a nicer way to do this? Regards
> Liviu
>
>
More information about the R-help
mailing list