[R] Variance explained by cluster analysis
jiho
jo.irisson at gmail.com
Tue Aug 28 20:25:30 CEST 2007
Hello,
As suggested in "De'ath, 2002. Multivariate regression trees: A new
technique for modelling species-environment relationships. Ecology, 83
(4):1105-1117" (for those interested), I am trying to compare the
performance of a multivariate regression tree to a cluster analysis.
A simple partitioning with k clusters (as done by `pam`) seemed
straightforward and appropriate to compare to an MRT with k leaves.
Now I am looking for a measure of how much variance each of these
methods explains. The MRT analysis provides me with such a measure. I
was wondering what I could use in a cluster analysis. When plotting
the pam object with which.plots=clusplot, there is a message at the
bottom of the plot: "These two components explain x% of the point
variability". Can I safely assume that this is a percentage of
variance explained by the k clusters? Is there anything else that I
could compute?
More generally, am I totally wrong in comparing these two methods?
Are there some references particularly appropriate to this? (NB: I am
already hunting down the Kaufman, L. and Rousseeuw
book)
Thank you in advance for your help.
JiHO
---
http://jo.irisson.free.fr/
More information about the R-help
mailing list