[R] Variance explained by cluster analysis

jiho jo.irisson at gmail.com
Tue Aug 28 20:25:30 CEST 2007


As suggested in "De'ath, 2002. Multivariate regression trees: A new  
technique for modelling species-environment relationships. Ecology, 83 
(4):1105-1117" (for those interested), I am trying to compare the  
performance of a multivariate regression tree to a cluster analysis.  
A simple partitioning with k clusters (as done by `pam`) seemed  
straightforward and appropriate to compare to an MRT with k leaves.
Now I am looking for a measure of how much variance each of these  
methods explains. The MRT analysis provides me with such a measure. I  
was wondering what I could use in a cluster analysis. When plotting  
the pam object with which.plots=clusplot, there is a message at the  
bottom of the plot: "These two components explain x% of the point  
variability". Can I safely assume that this is a percentage of  
variance explained by the k clusters? Is there anything else that I  
could compute?
More generally, am I totally wrong in comparing these two methods?  
Are there some references particularly appropriate to this? (NB: I am  
already hunting down the Kaufman, L. and Rousseeuw

Thank you in advance for your help.


More information about the R-help mailing list