[R] more clustering questions

Thu Dec 9 17:26:51 CET 2004

Dear Thomas,

the classical MDS tries to represent simultaneously
all distances as well as possible. It is based on something like a quadratic
loss function, and this means that the optimization concentrates
particularly on the adequate representation of the large distances. It seems
that the cmdscale result represents at least the 10s in the distance matrix
more properly than the solution you expected to obtain. This explains, for
example, why s5 is closer to s2 than to s1.

A more favourable method to represent the small distances properly is
Kruskal's nonmetric MDS, which is available as function isoMDS. 

Best,
Christian

On Thu, 9 Dec 2004, Dr. Thomas Isenbarger wrote:

> Sorry to bother you kind folks again with my questions.  I am trying to 
> learn as much as I can about all this, and I will admit that I don't 
> have the proper background, but I hope that someone can at least point 
> me in the correct direction.
> 
> I have created a test matrix for what I want to do:
> 
>     s1 s2 s3 s4 s5
> s1 10  5  0  8  7
> s2  5 10  0  0  5
> s3  0  0 10  0  0
> s4  8  0  0 10  0
> s5  7  5  0  0 10
> 
> this is a similarity matrix (lets call it "mini") i created to run some 
> tests.  thus, a self-against-self analysis gives a score of 10, and 
> lower scores denote lower degrees of similarity (8 denote two items 
> that are almost the same, etc).  s1 is closely related to s4 and s5, 
> but slightly more closely related to s4.  s2 is related similarly at 
> some medium level to s1 and s5.
> 
> i converted this into a dissimilarity matrix with R using
> 
> dissmini <- max(mini)-mini
> 
> this results in:
> 
>     s1 s2 s3 s4 s5
> s1  0  5 10  2  3
> s2  5  0 10 10  5
> s3 10 10  0 10 10
> s4  2 10 10  0 10
> s5  3  5 10 10  0
> 
> if I then do
> 
> plot(cmdscale(dissmini), type="n"); text(cmdscale(dissmini), 
> row.names(cmdscale(dissmini)))
> 
> I end up with a plot that shows (among other things) s2 and s5 very 
> close together, closer together than s1-s5 or s1-s2 or s1-s4.   This is 
> the opposite of what I would predict and what I want the plot to show.
> 
> If I instead use
> 
> plot(cmdscale(as.dist(dissmini)))
> 
> the plot is the same.
> 
> something like this:
> 
>                      s4
> 
>      s1
> 
> 
> s5
>   s2
> 
>                                                     s3
> 
> Thanks for your help,
> Tom Isenbarger
> 
> 
> 
> --
> isen at plantpath.wisc.edu
> thomas a isenbarger
> (608) 265-0850
> 
> 	[[alternative text/enriched version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

***********************************************************************
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag-online.de