[R] Similarity matrix
Kaspar Pflugshaupt
pflugshaupt at geobot.umnw.ethz.ch
Wed Apr 11 14:25:22 CEST 2001
On Wednesday 11 April 2001 10:23, Prof Brian Ripley wrote:
> And what does S-PLUS use? (Which is the point here?)
I've never done cluster analysis with S-Plus. But let's see:
The statistical manual for S-Plus 5.1/Unix fails to even mention similarity
matrices.
help(hclust) (in S-Plus 5.1/Unix and 3.4/Unix) says
USAGE:
hclust(dist, method = "compact", sim =)
[...]
sim=
structure giving similarities rather than distances. This can
either be a symmetric matrix or a vector with a "Size"
attribute. Missing values are not allowed.
The help text does not explain how the conversion to distances is done,
though. And the source is not available...
> I guess we have to experiment?
Well, I've taken the time to do it for you (S-PLus 3.4/Unix):
mat <- matrix(runif(100), nrow=10)
print(1 - plclust(hclust( sim=mat ))$yn) # 1 - ...: S-Plus seems to mirror
# the tree's y scale when given a similarity matrix
gives the same values as
print(plclust(hclust( 1-mat ))$yn)
but different values from
print(plclust(hclust( sqrt(1-mat) )$yn)
The grouping structure is constant, anyway.
So, S-Plus seems to use D=1-S rather than D=sqrt(1-S) internally.
For R, it might be a good idea to let the user choose the conversion method
via an additional parameter, making D=1-S the default.
According to Legendre & Legendre, the choice of similarity coefficient
_does_ make a difference as to which conversion should be preferred. For some
"species" of similarity coefficients, the resulting distance would be metric
and euclidean with one method but not with the other, for others vice versa.
I don't know if this matters for cluster analysis, but I think that it might,
especially when clustering with an euclidean metric.
Cheers (hoping this was to the point :-)
Kaspar Pflugshaupt
--
Kaspar Pflugshaupt
Geobotanical Institute
ETH Zurich, Switzerland
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list