[R] How to group by then count?

MacQueen, Don macqueen1 at llnl.gov
Sun Jan 4 23:03:46 CET 2015

This seems to me to be a case where thinking in terms of computer
programming concepts is getting in the way a bit. Approach it as a data
analysis task; the S language (upon which R is based) is designed in part
for data analysis so there is a function that does most of the job for you.

(I changed your vector of strings to make the result more easily

> x = c("1", "1", "2", "1", "5", "2",'3','5','5','2','2')
> tmp <- table(x)      ## counts the number of appearances of each element
> tmp[tmp==max(tmp)]   ## finds which one occurs most often

Meaning that the element '2' appears 4 times.  The table() function should
be fast even with long vectors. Here's an example with a vector of length
1 million:

foo <- table( sample(letters, 1e6, replace=TRUE) )

One of the seminal books on the S language is John M Chambers' Programming
with Data -- and I would emphasize the "with Data" part of that title.


Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550

On 1/4/15, 1:02 AM, "Monnand" <monnand at gmail.com> wrote:

>Hi all,
>I thought this was a very naive problem but I have not found any solution
>which is idiomatic to R.
>The problem is like this:
>Assuming we have vector of strings:
> x = c("1", "1", "2", "1", "5", "2")
>We want to count number of appearance of each string. i.e. in vector x,
>string "1" appears 3 times; "2" appears twice and "5" appears once. Then I
>want to know which string is the majority. In this case, it is "1".
>For imperative languages like C, C++ Java and python, I would use a hash
>table to count each strings where keys are the strings and values are the
>number of appearance. For functional languages like clojure, there're
>higher order functions like group-by.
>However, for R, I can hardly find a good solution to this simple problem.
>found a hash package, which implements hash table. However, installing a
>package simple for a hash table is really annoying for me. I did find
>aggregate and other functions which operates on data frames. But in my
>case, it is a simple vector. Converting it to a data frame may be not
>desirable. (Or is it?)
>Could anyone suggest me an idiomatic way of doing such job in R? I would
>appreciate for your help!
>	[[alternative HTML version deleted]]
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>PLEASE do read the posting guide
>and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list