[R] Frequency of a character in a string
Charles C. Berry
ccberry at ucsd.edu
Mon Nov 14 18:26:13 CET 2016
On Mon, 14 Nov 2016, Bert Gunter wrote:
> Yes, but it need some help, since nchar gives the length of the
> *entire* string; e.g.
>
> ## to count "a" 's :
>
>> x <-(c("abbababba","bbabbabbaaaba"))
>> nchar(gsub("[^a]","",x))
> [1] 4 6
>
> This is one of about 8 zillion ways to do this in base R if you don't
> want to use a specialized package.
>
> Just for curiosity: Can anyone comment on what is the most efficient
> way to do this using base R pattern matching?
>
Most efficient? There probably is no uniformly most efficient way to do
this as the timing will depend on the distribution of "a" in the atoms of
any vector as well as the length of the vector.
But here is one way to avoid the regular expression matching:
lengths(strsplit(paste0("X", x, "X"),"a",fixed=TRUE)) - 1
Chuck
More information about the R-help
mailing list