[R] Counting the occurence of each unique "charecter string"
Gabor Grothendieck
ggrothendieck at gmail.com
Sun Nov 27 20:16:34 CET 2005
On 11/27/05, Ted Harding <Ted.Harding at nessie.mcc.ac.uk> wrote:
> On 27-Nov-05 Marco Visser wrote:
> > LS,
> >
> > I would really like to know how to count the frequency/occurrence of
> > chachters inside a dataset. I am working with extreemly large datasets
> > of forest inventory data with a large variety of different species
> > inside it.
> > Each row inside the dataframe represents one individual tree and the
> > simplified dataframe looks something like this:
> >
> > num species dbh
> > 1 sp1 30
> > 2 sp1 20
> > 3 sp2 30
> > 4 sp1 40
> >
> > I need to be able to count the number of individuals per species, so
> > I need a command that will return for each unique species its
> > occurence inside the dataframe;
> >
> > [sp1] 3
> > [sp2] 1
>
> Does the following help? (Using an artificial example a bit more
> complicated than yours). The dataframe "trees" consists of a list
> of species names under "Species", and values of a numeric variable
> under "X".
>
>
> > trees
> Species X
> 1 Larix decidua 203
> 2 Pinus sylvestris 303
> 3 Larix decidua 202
> 4 Pinus sylvestris 301
> 5 Picea abies 102
> 6 Picea abies 103
> 7 Pinus sylvestris 302
> 8 Picea abies 101
> 9 Larix decidua 201
> 10 Picea abies 104
> 11 Picea abies 105
> 12 Pinus sylvestris 304
>
>
> > freqs<-as.data.frame(table(trees$Species))
> > colnames(freqs)<-c("Species","Counts")
> > freqs
> Species Counts
> 1 Larix decidua 3
> 2 Picea abies 5
> 3 Pinus sylvestris 4
>
>
> > mean(freqs$Counts)
> [1] 4
> > sd(freqs$Counts)
> [1] 1
>
>
> Just using table() would give you the same information, but
> converting it to a dataframe makes that information more
> readily accessible by familiar methods.
>
> Hoping this helps,
> Ted.
>
>
or using the iris dataset that comes with R and making use
of as.data.frame.table we can shorten that slightly to just:
as.data.frame.table(table(Species = iris$Species), responseName = "Count")
Incidently, I just noticed that there is an inconsistency between as.data.frame
and as.data.frame.table making it impossible to shorten as.data.frame.table
to as.data.frame in the above due to the responseName= argument
which is not referenced in the generic.
> args(as.data.frame)
function (x, row.names = NULL, optional = FALSE)
NULL
> args(as.data.frame.table)
function (x, row.names = NULL, optional = FALSE, responseName = "Freq")
NULL
> R.version.string # Windows
[1] "R version 2.2.0, 2005-10-24"
More information about the R-help
mailing list