[R] Question about PAM clustering method
Martin Maechler
maechler at stat.math.ethz.ch
Sat May 3 12:50:48 CEST 2003
>>>>> "Isogai" == Isogai Takashi <t_isog at hotmail.com>
>>>>> on Fri, 18 Apr 2003 08:57:15 +0000 writes:
(sorry for the late reaction, your e-mail got buried in a pile...)
Isogai> Hello everyone. I just started learning R for
Isogai> clustering analysis in my research project. I tried
Isogai> k-means method and PAM method, both of which were
Isogai> properly processed with my data. I have some
Isogai> questions about PAM graphical output.
Isogai> Suppose to do the commands shown below;
Isogai> pm <- pam(D,6) ; plot(pm)
Isogai> I got two charts after prompted. In the first
Isogai> chart, 6 oval clusters are drawn together with data
Isogai> markers. I see four 'pink' lines that connect oval
Isogai> clusters. In this case, oval clusters are located
Isogai> very near, and some of them are overlapped. The
Isogai> line starts from the edge of one oval, and it ends
Isogai> at the edge of another oval. Does anyone know the
Isogai> meaning of this line? I imagine that the line shows
Isogai> close linkage of the corresponding clusters, but no
Isogai> comments regarding this line can be found in the
Isogai> help documents.
help(clusplot.default) has quite a list of references, one of
them even available on the web. You should read at least one
(or alternatively look at the R source of these function; this
is open source !)
BTW (to all readers!):
Use
library(cluster, keep.source = TRUE)
^^^^^^^^^^^^^^^^^^
for looking at the source code
or instead even set
options(keep.source.pkgs = TRUE)
in your Rprofile.
Isogai> Second question is the meaning of the comment "these
Isogai> two components explain x% of the point variability"
Isogai> at the bottom oh the graph. In my case, the data
Isogai> has 6 (groups) x 20 (properties) dimension. I think
Isogai> that R extract the first and the second factors, and
Isogai> map them on the graph. Therefore, the number is the
Isogai> total contribution of those two factors. Am I
Isogai> correct? If so, how can I choose the factors other
Isogai> than the first or the second?
help(clusplot[.default]) tells that these are principal components
or MDS coordinates depending on the input.
To choose other than the first two PCs is currently not
possible unless you change the clusplot.default().
It shouldn't be too hard and I will gladly accept patches which
implement such a new feature.
Even more interesting would be to provide other projection like
coordinates (from the literature) for cluster representation!
Isogai> Lastly, I read a document that says about the
Isogai> average silhouette, "even that highest width is
Isogai> below (say) 0.25, one may conclude that no
Isogai> substantial structure has been found". Is this
Isogai> true? In my case, the value is far below 0.25,
Isogai> possibly because some clusters overlap on the graph.
Isogai> I can accept the overlapping clusters from the
Isogai> viewpoint of my research, but I wonder if the PAM
Isogai> method is also useful for these clusters.
The reference on clusplot is really Rousseeuw, Struyf and coworkers.
So I assume you read this in one of the references in
help(clusplot.default) and they do give some indications on
this. I believe the silhouette widths to be one "goodness of fit"
measure, useful in many but not all cases. You have to *think*
about what these clusters *mean* in your data situation (and
also about what you really want to achieve with the clustering).
These real questions can never be answered by a single ASW (average
silhouette width) or any other statistic.
Isogai> Thank you very much for your help in advance.
Isogai> T. Isog Tokyo, Japan
You're welcome.
Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><
More information about the R-help
mailing list