[R] Counting the occurence of each unique "charecter string"

Gabor Grothendieck ggrothendieck at gmail.com
Sun Nov 27 20:16:34 CET 2005


On 11/27/05, Ted Harding <Ted.Harding at nessie.mcc.ac.uk> wrote:
> On 27-Nov-05 Marco Visser wrote:
> > LS,
> >
> >   I would really like to know how to count the frequency/occurrence of
> > chachters inside a dataset. I am working with extreemly large datasets
> > of forest inventory data with a large variety of different species
> > inside it.
> >   Each row inside the dataframe represents one individual tree and the
> > simplified dataframe looks something like this:
> >
> >   num species dbh
> >   1        sp1           30
> >   2        sp1          20
> >   3        sp2          30
> >   4        sp1          40
> >
> >   I need to be able to count the number of individuals per species, so
> > I  need a command that will return for each unique species its
> > occurence  inside the dataframe;
> >
> >   [sp1]     3
> >   [sp2]     1
>
> Does the following help? (Using an artificial example a bit more
> complicated than yours). The dataframe "trees" consists of a list
> of species names under "Species", and values of a numeric variable
> under "X".
>
>
>  > trees
>              Species   X
>  1     Larix decidua 203
>  2  Pinus sylvestris 303
>  3     Larix decidua 202
>  4  Pinus sylvestris 301
>  5       Picea abies 102
>  6       Picea abies 103
>  7  Pinus sylvestris 302
>  8       Picea abies 101
>  9     Larix decidua 201
>  10      Picea abies 104
>  11      Picea abies 105
>  12 Pinus sylvestris 304
>
>
>  > freqs<-as.data.frame(table(trees$Species))
>  > colnames(freqs)<-c("Species","Counts")
>  > freqs
>             Species Counts
>  1    Larix decidua      3
>  2      Picea abies      5
>  3 Pinus sylvestris      4
>
>
>  > mean(freqs$Counts)
>  [1] 4
>  > sd(freqs$Counts)
>  [1] 1
>
>
> Just using table() would give you the same information, but
> converting it to a dataframe makes that information more
> readily accessible by familiar methods.
>
> Hoping this helps,
> Ted.
>
>

or using the iris dataset that comes with R and making use
of as.data.frame.table we can shorten that slightly to just:

as.data.frame.table(table(Species = iris$Species), responseName = "Count")

Incidently, I just noticed that there is an inconsistency between as.data.frame
and as.data.frame.table making it impossible to shorten as.data.frame.table
to as.data.frame in the above due to the responseName= argument
which is not referenced in the generic.

> args(as.data.frame)
function (x, row.names = NULL, optional = FALSE)
NULL
> args(as.data.frame.table)
function (x, row.names = NULL, optional = FALSE, responseName = "Freq")
NULL
> R.version.string # Windows
[1] "R version 2.2.0, 2005-10-24"




More information about the R-help mailing list