[R] difference between trees in R?

Friedrich Leisch Friedrich.Leisch at ci.tuwien.ac.at
Tue Aug 21 16:35:22 CEST 2001


>>>>> On Tue, 21 Aug 2001 10:09:27 -0400,
>>>>> Mark Robinson (MR) wrote:

  > Hi.
  > I am wondering if anybody has studied and/or written code in R to
  > calculate the distance between 2 "trees".  For example, if one does a
  > hierarchical agglomerative clustering and say, a hierachical divisive
  > clustering (represented as trees) and wishes to compute a metric on
  > them.  I am thinking of something like the symmetric difference as
  > mentioned in Margush and McMorris (1982).

  > My application is actually a bit different than that above so I'll
  > describe it.  I actually want to combine numerous k-means
  > classifications into 1.  Because subsequent runs of the the k-means
  > procedure are going to give different cluster memberships (because of
  > different starting points), I wanted to run it a bunch of times and
  > combine it into a consensus.  But to do that, I wanted to quantify how
  > different a consensus of , for example, 3 k-mean runs is from a
  > consensus of 4 k-mean runs (denoted here by d(3,4)).

  > Presumably, the sequence d(3,4), d(4,5), ..., d(p,p+1) would keep
  > decreasing and at some point I would be satisfied that no further k-mean
  > runs to add to the consensus would be necessary.

  > I thought I could represent a k-means run as a binary tree or do a
  > hierarchical agglomerative clustering of a matrix of cluster memberships
  > (1s and 0s) from p k-mean runs but maybe this isn't the best approach.

  > So, is there a metric on two consensuses of k-mean runs?  Or another
  > approach that I can implement in R.

There are several measures of similarity between partitions in the
literature (which of course can easily be transformed to distance
measures). The most popular is probably the corrected Rand Index,
which can be found in the classAgreement() function of package e1071
(literature references are in the help page).

Don't know if any of these is a metric (in the mathematical sense) in
the space of subsets.

Hope this helps,

-- 
-------------------------------------------------------------------
                        Friedrich  Leisch 
Institut für Statistik                     Tel: (+43 1) 58801 10715
Technische Universität Wien                Fax: (+43 1) 58801 10798
Wiedner Hauptstraße 8-10/1071      Friedrich.Leisch at ci.tuwien.ac.at
A-1040 Wien, Austria             http://www.ci.tuwien.ac.at/~leisch
-------------------------------------------------------------------

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list