[R] test if elements of a character vector contain letters
Marc Schwartz
marc_schwartz at me.com
Tue Aug 7 22:26:35 CEST 2012
On Aug 7, 2012, at 3:18 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
>
> On Aug 7, 2012, at 3:02 PM, Liviu Andronic <landronimirc at gmail.com> wrote:
>
>> On Mon, Aug 6, 2012 at 7:35 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
>>> is.letter <- function(x) grepl("[[:alpha:]]", x)
>>> is.number <- function(x) grepl("[[:digit:]]", x)
>>>
>>
>> Another follow-up. To test for (non-)alphanumeric one would do the following:
>>> x <- c(letters, 1:26, '+', '-', '%^&')
>>> x[1:10] <- paste(x[1:10], 1:10, sep='')
>>> x
>> [1] "a1" "b2" "c3" "d4" "e5" "f6" "g7" "h8" "i9" "j10" "k"
>> "l" "m" "n"
>> [15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y"
>> "z" "1" "2"
>> [29] "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
>> "14" "15" "16"
>> [43] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "+"
>> "-" "%^&"
>>> xb <- grepl("[[:alnum:]]",x) ##test for alphanumeric chars
>>> x[xb]
>> [1] "a1" "b2" "c3" "d4" "e5" "f6" "g7" "h8" "i9" "j10" "k"
>> "l" "m" "n"
>> [15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y"
>> "z" "1" "2"
>> [29] "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
>> "14" "15" "16"
>> [43] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
>>> xb <- grepl("[[:punct:]]",x) ##test for non-alphanumeric chars
>>> x[xb]
>> [1] "+" "-" "%^&"
>
>
> That will get you values where punctuation characters are used, but there may be other non-alphanumeric characters in the vector. There may be ASCII control codes, tabs, newlines, CR, LF, spaces, etc. which would not be found by using [:punct:].
>
> For example:
>
>> grepl("[[:punct:]]", " ")
> [1] FALSE
>
>
> If you want to explicitly look for non-alphanumeric characters, you would be better off using a negation of [:alnum:] such as:
>
> grepl("[^[:alnum:]]", x)
>
> or
>
> !grepl("[[:alnum:]]", x)
>
Actually (for the second time in two days) I need to correct myself. The second option would not work correctly in cases where there is a mix of alpha-numerics and non:
> !grepl("[[:alnum:]]", "ab%")
[1] FALSE
since there are alpha-numerics present, whereas the first option will:
> grepl("[^[:alnum:]]", "ab%")
[1] TRUE
So, use the first option.
Regards,
Marc <who is heading to the coffee machine...>
More information about the R-help
mailing list