[R] cluster analysis using Dmax

Kris Lockyear noviodunum at hotmail.com
Wed Nov 1 15:22:52 CET 2006


Dear All,

a long time ago I ran a cluster analysis where the dissimilarity matrix used 
consisted of Dmax (or Kolmogorov-Smirnov distance) values.  In other words 
the maximum difference between two cumulative proportion curves.  This all 
worked very well indeed.  The matrix was calculated using Dbase III+ and 
took a day and a half and the clustering was done using MV-ARCH, with the 
resultant dendrogram converted from HP Plotter language to PostScript 
manually.  As you might guess, I'd like to be able to do this more 
efficiently in R.

I have looked through the various help files and found that some of the 
clustering routines will take a dissimilarity matrix as input (yay!).

My questions (as a very novice R user) are:

a) how would one go about calculating the matrix of Dmax/KS distance values?

b) of the many clustering packages (I'll be doing a simple average link 
hierarchical clustering) is there one where I can ask: "If I 'cut' the 
dendrogram at the 0.x dissimilarity level, which items are in which  
clusters?" (As my dataset has over 200 items this is non-trivial to work out 
manually).

Many thanks indeed for your help.

Kris Lockyear.



More information about the R-help mailing list