[R] test if elements of a character vector contain letters

Marc Schwartz marc_schwartz at me.com
Mon Aug 6 19:35:13 CEST 2012


On Aug 6, 2012, at 12:06 PM, Marc Schwartz <marc_schwartz at me.com> wrote:

> Perhaps I am missing something, but why use sapply() when grepl() is already vectorized?
> 
> is.letter <- function(x) grepl("[:alpha:]", x)
> is.number <- function(x) grepl("[:digit:]", x)

Sorry, typos in the above from my C&P. Should be:

is.letter <- function(x) grepl("[[:alpha:]]", x)
is.number <- function(x) grepl("[[:digit:]]", x)

Marc

> 
> x <- c(letters, 1:26)
> 
> x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')
> 
> x <- rep(x, 1e3)
> 
>> str(x)
> chr [1:52000] "a2" "b10" "c8" "d3" "e6" "f1" "g5" ...
> 
>> system.time(is.letter(x))
>   user  system elapsed 
>  0.011   0.000   0.010 
> 
>> system.time(is.number(x))
>   user  system elapsed 
>  0.010   0.000   0.011 
> 
> 
> Regards,
> 
> Marc Schwartz
> 
> On Aug 6, 2012, at 11:51 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> 
>> Hello,
>> 
>> Fun as an exercise in vectorization. 30 times faster. Don't look, guess.
>> 
>> Gave it up? Ok, here it is.
>> 
>> 
>> is_letter <- function(x, pattern=c(letters, LETTERS)){
>>   sapply(x, function(y){
>>       any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
>>   })
>> }
>> # test ascii codes, just one loop.
>> has_letter <- function(x){
>>   sapply(x, function(y){
>>       y <- as.integer(charToRaw(y))
>>       any((65 <= y & y <= 90) | (97 <= y & y <= 122))
>>   })
>> }
>> 
>> x <- c(letters, 1:26)
>> x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')
>> x <- rep(x, 1e3)
>> 
>> t1 <- system.time(is_letter(x))
>> t2 <- system.time(has_letter(x))
>> rbind(t1, t2, t1/t2)
>>  user.self sys.self elapsed user.child sys.child
>> t1     15.69        0   15.74         NA        NA
>> t2      0.50        0    0.50         NA        NA
>>      31.38      NaN   31.48         NA        NA
>> 
>> 
>> Em 06-08-2012 17:25, Liviu Andronic escreveu:
>>> Dear all
>>> I'm pretty sure that I'm approaching the problem in a wrong way.
>>> Suppose the following character vector:
>>>> (x[1:10] <- paste(x[1:10], sample(1:10, 10), sep=''))
>>> [1] "a10" "b7"  "c2"  "d3"  "e6"  "f1"  "g5"  "h8"  "i9"  "j4"
>>>> x
>>> [1] "a10" "b7"  "c2"  "d3"  "e6"  "f1"  "g5"  "h8"  "i9"  "j4"  "k"
>>> "l"   "m"   "n"
>>> [15] "o"   "p"   "q"   "r"   "s"   "t"   "u"   "v"   "w"   "x"   "y"
>>> "z"   "1"   "2"
>>> [29] "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11"  "12"  "13"
>>> "14"  "15"  "16"
>>> [43] "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"  "25"  "26"
>>> 
>>> 
>>> How do you test whether the elements of the vector contain at least
>>> one letter (or at least one digit) and obtain a logical vector of the
>>> same dimension? I came up with the following awkward function:
>>> is_letter <- function(x, pattern=c(letters, LETTERS)){
>>>    sapply(x, function(y){
>>>        any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
>>>    })
>>> }
>>> 
>>>> is_letter(x)
>>>  a10    b7    c2    d3    e6    f1    g5    h8    i9    j4     k
>>> l     m     n     o
>>> TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
>>> TRUE  TRUE  TRUE  TRUE
>>>    p     q     r     s     t     u     v     w     x     y     z
>>> 1     2     3     4
>>> TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
>>> FALSE FALSE FALSE FALSE
>>>    5     6     7     8     9    10    11    12    13    14    15
>>> 16    17    18    19
>>> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>> FALSE FALSE FALSE FALSE
>>>   20    21    22    23    24    25    26
>>> FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>>> is_letter(x, 0:9)  ##function slightly misnamed
>>>  a10    b7    c2    d3    e6    f1    g5    h8    i9    j4     k
>>> l     m     n     o
>>> TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
>>> FALSE FALSE FALSE FALSE
>>>    p     q     r     s     t     u     v     w     x     y     z
>>> 1     2     3     4
>>> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>> TRUE  TRUE  TRUE  TRUE
>>>    5     6     7     8     9    10    11    12    13    14    15
>>> 16    17    18    19
>>> TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
>>> TRUE  TRUE  TRUE  TRUE
>>>   20    21    22    23    24    25    26
>>> TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
>>> 
>>> 
>>> Is there a nicer way to do this? Regards
>>> Liviu
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list