[R-sig-eco] CoDA: Clustering Multiple Data Sets

Rich Shepard rshepard at appl-ecosys.com
Thu Oct 9 21:13:16 CEST 2014


   The documentation for packages compositions and robCompositions describe
distance measures and (in the former package) clustering. However, all the
examples, and the function syntax, apply to a single data set.

   This works well with geochemical and official statistical data when the
goal is to examine relationships among the components in the data set. I
find no examples for clustering multiple compositional data sets. For
example, if the expenditures (or expendituresEU) packages in robCompositions
included data from multiple countries and the analytical goal is to cluster
the countries based on each one's compositional data set. The package
AnimalVegetation in the compositions package compares "[A]real compositions
by abundance of vegetation and animals for 50 plots in each of regions A and
B" and appears to be similar to my data: macroinvertebrate compositions by
functional feeding groups and multiple (and variable number) of years in
each of 6 stream networks; each stream network is a separate data set. I
want to cluster the streams based on each data set. Unfortunately, I do not
see an example in package compositions that uses the AnimalVegetation data
for clustering.

   The hclust() function in the stats and compositions package (perhaps the
latter calls the function in the former package) appears to be limited to a
single data set.

   What package and function will allow me to calculate a distance matrix for
these 6 compositional data sets, then use those distances for hierarchical
clustering?

Rich



More information about the R-sig-ecology mailing list