[R] Similarity matrix
Frank E Harrell Jr
fharrell at virginia.edu
Wed Apr 11 13:53:46 CEST 2001
Thanks very much to Brian Ripley, Kaspar Pflugshaupt, and Jari Oksanen
for addressing this issue.
The S-Plus online help sheds no light on the issue. The S-Plus
statistics manual has a lot of information on clustering, but
only focuses on distance measures, as similarity measures
are only allowed in a minority of the clustering functions.
Brian Ripley did the test that I should have done to show
that hclust is using a simple translation from similarity
to distance.
The kinds of similarities I routinely use are
- pairwise squared Spearman rank correlation coefficients
- pairwise proportion of the time that two variables are
missing on the same observation
- Hoeffding D nonparametric dependence index
(the scaling of which may be more problematic than the other two)
Thank you all,
Frank Harrell
Prof Brian Ripley wrote:
>
> On Tue, 10 Apr 2001, Frank E Harrell Jr wrote:
>
> > I frequently use hclust on a similarity matrix. In R only a
> > distance matrix is allowed. Is there a simple reliable
> > transformation of a similarity matrix that will result
> > in a distance matrix making hclust work the same as
> > S-Plus with a similarity matrix? Venables & Ripley 3rd
> > edition implies that a simple reversal of values
> > will suffice. Thanks -Frank
>
> Testing with Splus 6.0 shows that dist = 1 - sim is used there, so the
> simple assumption is correct.
>
> d <- dist(longley.y)
> d <- d/max(d)
> hclust(d, "ave")
> $merge:
> [,1] [,2]
> [1,] -2 -4
> [2,] -6 -8
> [3,] -1 -3
> [4,] -14 -15
> [5,] -10 -11
> [6,] -5 2
> [7,] -9 -12
> [8,] -13 5
> [9,] 1 3
> [10,] -16 4
> [11,] -7 7
> [12,] 8 10
> [13,] 6 11
> [14,] 9 13
> [15,] 12 14
>
> $height:
> [1] 0.006262043 0.011753372 0.014643545 0.022447014 0.030057803 0.046146438
> [7] 0.047591522 0.061849713 0.087427750 0.106310219 0.123025045 0.153018638
> [13] 0.221579969 0.384352922 0.570969820
>
> $order:
> [1] 13 10 11 16 14 15 2 4 1 3 5 6 8 7 9 12
>
> hclust(sim=1-d, method="ave")
> $merge:
> [,1] [,2]
> [1,] -2 -4
> [2,] -6 -8
> [3,] -1 -3
> [4,] -14 -15
> [5,] -10 -11
> [6,] -5 2
> [7,] -9 -12
> [8,] -13 5
> [9,] 3 1
> [10,] -16 4
> [11,] -7 7
> [12,] 10 8
> [13,] 11 6
> [14,] 13 9
> [15,] 14 12
>
> $height:
> [1] 0.9937379 0.9882466 0.9853565 0.9775530 0.9699422 0.9538536 0.9524085
> [8] 0.9381503 0.9125723 0.8936898 0.8769749 0.8469813 0.7784200 0.6156471
> [15] 0.4290302
>
> $order:
> [1] 7 9 12 5 6 8 1 3 2 4 16 14 15 13 10 11
>
> which is the same but expressed in similarities.
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272860 (secr)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list