[R] test if elements of a character vector contain letters
David L Carlson
dcarlson at tamu.edu
Mon Aug 6 19:39:06 CEST 2012
Only an extra set of brackets:
is.letter <- function(x) grepl("[[:alpha:]]", x)
is.number <- function(x) grepl("[[:digit:]]", x)
Without them, the functions are fast, but wrong.
> x
[1] "a8" "b5" "c10" "d1" "e6" "f2" "g4" "h3" "i7" "j9" "k" "l"
[13] "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x"
[25] "y" "z" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
[37] "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22"
[49] "23" "24" "25" "26"
> is.letter <- function(x) grepl("[:alpha:]", x)
> is.letter(x)
[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
[13] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE
> is.letter <- function(x) grepl("[[:alpha:]]", x)
> is.letter(x)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[25] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Marc Schwartz
> Sent: Monday, August 06, 2012 12:07 PM
> To: Rui Barradas
> Cc: r-help
> Subject: Re: [R] test if elements of a character vector contain letters
>
> Perhaps I am missing something, but why use sapply() when grepl() is
> already vectorized?
>
> is.letter <- function(x) grepl("[:alpha:]", x)
> is.number <- function(x) grepl("[:digit:]", x)
>
> x <- c(letters, 1:26)
>
> x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')
>
> x <- rep(x, 1e3)
>
> > str(x)
> chr [1:52000] "a2" "b10" "c8" "d3" "e6" "f1" "g5" ...
>
> > system.time(is.letter(x))
> user system elapsed
> 0.011 0.000 0.010
>
> > system.time(is.number(x))
> user system elapsed
> 0.010 0.000 0.011
>
>
> Regards,
>
> Marc Schwartz
>
> On Aug 6, 2012, at 11:51 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>
> > Hello,
> >
> > Fun as an exercise in vectorization. 30 times faster. Don't look,
> guess.
> >
> > Gave it up? Ok, here it is.
> >
> >
> > is_letter <- function(x, pattern=c(letters, LETTERS)){
> > sapply(x, function(y){
> > any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
> > })
> > }
> > # test ascii codes, just one loop.
> > has_letter <- function(x){
> > sapply(x, function(y){
> > y <- as.integer(charToRaw(y))
> > any((65 <= y & y <= 90) | (97 <= y & y <= 122))
> > })
> > }
> >
> > x <- c(letters, 1:26)
> > x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')
> > x <- rep(x, 1e3)
> >
> > t1 <- system.time(is_letter(x))
> > t2 <- system.time(has_letter(x))
> > rbind(t1, t2, t1/t2)
> > user.self sys.self elapsed user.child sys.child
> > t1 15.69 0 15.74 NA NA
> > t2 0.50 0 0.50 NA NA
> > 31.38 NaN 31.48 NA NA
> >
> >
> > Em 06-08-2012 17:25, Liviu Andronic escreveu:
> >> Dear all
> >> I'm pretty sure that I'm approaching the problem in a wrong way.
> >> Suppose the following character vector:
> >>> (x[1:10] <- paste(x[1:10], sample(1:10, 10), sep=''))
> >> [1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4"
> >>> x
> >> [1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" "k"
> >> "l" "m" "n"
> >> [15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y"
> >> "z" "1" "2"
> >> [29] "3" "4" "5" "6" "7" "8" "9" "10" "11" "12"
> "13"
> >> "14" "15" "16"
> >> [43] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
> >>
> >>
> >> How do you test whether the elements of the vector contain at least
> >> one letter (or at least one digit) and obtain a logical vector of
> the
> >> same dimension? I came up with the following awkward function:
> >> is_letter <- function(x, pattern=c(letters, LETTERS)){
> >> sapply(x, function(y){
> >> any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
> >> })
> >> }
> >>
> >>> is_letter(x)
> >> a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k
> >> l m n o
> >> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> >> TRUE TRUE TRUE TRUE
> >> p q r s t u v w x y z
> >> 1 2 3 4
> >> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> >> FALSE FALSE FALSE FALSE
> >> 5 6 7 8 9 10 11 12 13 14 15
> >> 16 17 18 19
> >> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> >> FALSE FALSE FALSE FALSE
> >> 20 21 22 23 24 25 26
> >> FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> >>> is_letter(x, 0:9) ##function slightly misnamed
> >> a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k
> >> l m n o
> >> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
> >> FALSE FALSE FALSE FALSE
> >> p q r s t u v w x y z
> >> 1 2 3 4
> >> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> >> TRUE TRUE TRUE TRUE
> >> 5 6 7 8 9 10 11 12 13 14 15
> >> 16 17 18 19
> >> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> >> TRUE TRUE TRUE TRUE
> >> 20 21 22 23 24 25 26
> >> TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> >>
> >>
> >> Is there a nicer way to do this? Regards
> >> Liviu
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list