[R] test if elements of a character vector contain letters

Marc Schwartz marc_schwartz at me.com
Tue Aug 7 22:26:35 CEST 2012


On Aug 7, 2012, at 3:18 PM, Marc Schwartz <marc_schwartz at me.com> wrote:

> 
> On Aug 7, 2012, at 3:02 PM, Liviu Andronic <landronimirc at gmail.com> wrote:
> 
>> On Mon, Aug 6, 2012 at 7:35 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
>>> is.letter <- function(x) grepl("[[:alpha:]]", x)
>>> is.number <- function(x) grepl("[[:digit:]]", x)
>>> 
>> 
>> Another follow-up. To test for (non-)alphanumeric one would do the following:
>>> x <- c(letters, 1:26, '+', '-', '%^&')
>>> x[1:10] <- paste(x[1:10], 1:10, sep='')
>>> x
>> [1] "a1"  "b2"  "c3"  "d4"  "e5"  "f6"  "g7"  "h8"  "i9"  "j10" "k"
>> "l"   "m"   "n"
>> [15] "o"   "p"   "q"   "r"   "s"   "t"   "u"   "v"   "w"   "x"   "y"
>> "z"   "1"   "2"
>> [29] "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11"  "12"  "13"
>> "14"  "15"  "16"
>> [43] "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"  "25"  "26"  "+"
>> "-"   "%^&"
>>> xb <- grepl("[[:alnum:]]",x)  ##test for alphanumeric chars
>>> x[xb]
>> [1] "a1"  "b2"  "c3"  "d4"  "e5"  "f6"  "g7"  "h8"  "i9"  "j10" "k"
>> "l"   "m"   "n"
>> [15] "o"   "p"   "q"   "r"   "s"   "t"   "u"   "v"   "w"   "x"   "y"
>> "z"   "1"   "2"
>> [29] "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11"  "12"  "13"
>> "14"  "15"  "16"
>> [43] "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"  "25"  "26"
>>> xb <- grepl("[[:punct:]]",x)  ##test for non-alphanumeric chars
>>> x[xb]
>> [1] "+"   "-"   "%^&"
> 
> 
> That will get you values where punctuation characters are used, but there may be other non-alphanumeric characters in the vector. There may be ASCII control codes, tabs, newlines, CR, LF, spaces, etc. which would not be found by using [:punct:].
> 
> For example:
> 
>> grepl("[[:punct:]]", " ")
> [1] FALSE
> 
> 
> If you want to explicitly look for non-alphanumeric characters, you would be better off using a negation of [:alnum:] such as:
> 
> grepl("[^[:alnum:]]", x)
> 
> or
> 
> !grepl("[[:alnum:]]", x)
> 



Actually (for the second time in two days) I need to correct myself. The second option would not work correctly in cases where there is a mix of alpha-numerics and non:

> !grepl("[[:alnum:]]", "ab%")
[1] FALSE

since there are alpha-numerics present, whereas the first option will:

> grepl("[^[:alnum:]]", "ab%")
[1] TRUE


So, use the first option.

Regards,

Marc <who is heading to the coffee machine...>



More information about the R-help mailing list