[R-sig-eco] CoDA: Clustering Multiple Data Sets

Fri Oct 10 16:39:50 CEST 2014

On Fri, 10 Oct 2014, separent at yahoo.com wrote:

> It is not clear whether you need a supervised or an unsupervised model.
> Clustering is unsupervised: it will classify compositions in hierarchical
> groups regardless the label (countries, regions). If this is what you
> intend, you might compute the clustering (hclust) on an euclidean distance
> matrix (vegdist) performed across the clr- or ilr-transformed data (both
> return the same distances). If you mean a supervised approach, you might
> want to explain how groups differ, and/or predict to which group the
> composition belongs. To explain, discriminant analysis (packages MASS or
> ade4) is (arguably) often a good choice. To predict a category, you might
> look at machine learning techniques (see caret package among many others).

Serge-Étienne,

   It would be an unsupervised model. But, more importantly, you let me see
outside the rut into which I wandered. Categorizing streams based on
functional composition, then classifying new streams based on those
categories has not been a completely satisfying solution, and after mailing
my message yesterday I decided to look at a better paradigm. There's a
reason why clustering across multiple compositional data sets has not been
commonly used in the literature I've read.

   Time to step back and examine various multivariate regression approaches;
the intended use of these compositional data is to explain water quality
based on the biota present.

   Thanks for your valuable inputs.

Carpe weekend,

Rich