[R] cluster analysis using Dmax
Kris Lockyear
noviodunum at hotmail.com
Wed Nov 1 15:22:52 CET 2006
Dear All,
a long time ago I ran a cluster analysis where the dissimilarity matrix used
consisted of Dmax (or Kolmogorov-Smirnov distance) values. In other words
the maximum difference between two cumulative proportion curves. This all
worked very well indeed. The matrix was calculated using Dbase III+ and
took a day and a half and the clustering was done using MV-ARCH, with the
resultant dendrogram converted from HP Plotter language to PostScript
manually. As you might guess, I'd like to be able to do this more
efficiently in R.
I have looked through the various help files and found that some of the
clustering routines will take a dissimilarity matrix as input (yay!).
My questions (as a very novice R user) are:
a) how would one go about calculating the matrix of Dmax/KS distance values?
b) of the many clustering packages (I'll be doing a simple average link
hierarchical clustering) is there one where I can ask: "If I 'cut' the
dendrogram at the 0.x dissimilarity level, which items are in which
clusters?" (As my dataset has over 200 items this is non-trivial to work out
manually).
Many thanks indeed for your help.
Kris Lockyear.
More information about the R-help
mailing list