[R-sig-eco] CoDA: Clustering Multiple Data Sets
Rich Shepard
rshepard at appl-ecosys.com
Fri Oct 10 16:39:50 CEST 2014
On Fri, 10 Oct 2014, separent at yahoo.com wrote:
> It is not clear whether you need a supervised or an unsupervised model.
> Clustering is unsupervised: it will classify compositions in hierarchical
> groups regardless the label (countries, regions). If this is what you
> intend, you might compute the clustering (hclust) on an euclidean distance
> matrix (vegdist) performed across the clr- or ilr-transformed data (both
> return the same distances). If you mean a supervised approach, you might
> want to explain how groups differ, and/or predict to which group the
> composition belongs. To explain, discriminant analysis (packages MASS or
> ade4) is (arguably) often a good choice. To predict a category, you might
> look at machine learning techniques (see caret package among many others).
Serge-Étienne,
It would be an unsupervised model. But, more importantly, you let me see
outside the rut into which I wandered. Categorizing streams based on
functional composition, then classifying new streams based on those
categories has not been a completely satisfying solution, and after mailing
my message yesterday I decided to look at a better paradigm. There's a
reason why clustering across multiple compositional data sets has not been
commonly used in the literature I've read.
Time to step back and examine various multivariate regression approaches;
the intended use of these compositional data is to explain water quality
based on the biota present.
Thanks for your valuable inputs.
Carpe weekend,
Rich
More information about the R-sig-ecology
mailing list