[R] test if elements of a character vector contain letters

Martin Morgan mtmorgan at fhcrc.org
Mon Aug 6 19:04:58 CEST 2012


On 08/06/2012 09:51 AM, Rui Barradas wrote:
> Hello,
>
> Fun as an exercise in vectorization. 30 times faster. Don't look, guess.

 > system.time(res0 <- grepl("[[:alpha:]]", x))
    user  system elapsed
   0.060   0.000   0.061
 > system.time(res1 <- has_letter(x))
    user  system elapsed
   3.728   0.008   3.747
 > all.equal(res0, res1, check.attributes=FALSE)
[1] TRUE

>
> Gave it up? Ok, here it is.
>
>
> is_letter <- function(x, pattern=c(letters, LETTERS)){
>      sapply(x, function(y){
>          any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
>      })
> }
> # test ascii codes, just one loop.
> has_letter <- function(x){
>      sapply(x, function(y){
>          y <- as.integer(charToRaw(y))
>          any((65 <= y & y <= 90) | (97 <= y & y <= 122))
>      })
> }
>
> x <- c(letters, 1:26)
> x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')
> x <- rep(x, 1e3)
>
> t1 <- system.time(is_letter(x))
> t2 <- system.time(has_letter(x))
> rbind(t1, t2, t1/t2)
>     user.self sys.self elapsed user.child sys.child
> t1     15.69        0   15.74         NA        NA
> t2      0.50        0    0.50         NA        NA
>         31.38      NaN   31.48         NA        NA
>
>
> Em 06-08-2012 17:25, Liviu Andronic escreveu:
>> Dear all
>> I'm pretty sure that I'm approaching the problem in a wrong way.
>> Suppose the following character vector:
>>> (x[1:10] <- paste(x[1:10], sample(1:10, 10), sep=''))
>>   [1] "a10" "b7"  "c2"  "d3"  "e6"  "f1"  "g5"  "h8"  "i9"  "j4"
>>> x
>>   [1] "a10" "b7"  "c2"  "d3"  "e6"  "f1"  "g5"  "h8"  "i9"  "j4"  "k"
>> "l"   "m"   "n"
>> [15] "o"   "p"   "q"   "r"   "s"   "t"   "u"   "v"   "w"   "x"   "y"
>> "z"   "1"   "2"
>> [29] "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11"  "12"  "13"
>> "14"  "15"  "16"
>> [43] "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"  "25"  "26"
>>
>>
>> How do you test whether the elements of the vector contain at least
>> one letter (or at least one digit) and obtain a logical vector of the
>> same dimension? I came up with the following awkward function:
>> is_letter <- function(x, pattern=c(letters, LETTERS)){
>>      sapply(x, function(y){
>>          any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
>>      })
>> }
>>
>>> is_letter(x)
>>    a10    b7    c2    d3    e6    f1    g5    h8    i9    j4     k
>> l     m     n     o
>>   TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
>> TRUE  TRUE  TRUE  TRUE
>>      p     q     r     s     t     u     v     w     x     y     z
>> 1     2     3     4
>>   TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
>> FALSE FALSE FALSE FALSE
>>      5     6     7     8     9    10    11    12    13    14    15
>> 16    17    18    19
>> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>> FALSE FALSE FALSE FALSE
>>     20    21    22    23    24    25    26
>> FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>> is_letter(x, 0:9)  ##function slightly misnamed
>>    a10    b7    c2    d3    e6    f1    g5    h8    i9    j4     k
>> l     m     n     o
>>   TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
>> FALSE FALSE FALSE FALSE
>>      p     q     r     s     t     u     v     w     x     y     z
>> 1     2     3     4
>> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>> TRUE  TRUE  TRUE  TRUE
>>      5     6     7     8     9    10    11    12    13    14    15
>> 16    17    18    19
>>   TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
>> TRUE  TRUE  TRUE  TRUE
>>     20    21    22    23    24    25    26
>>   TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
>>
>>
>> Is there a nicer way to do this? Regards
>> Liviu
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the R-help mailing list