[R] Question about PAM clustering method

Martin Maechler maechler at stat.math.ethz.ch
Sat May 3 12:50:48 CEST 2003


>>>>> "Isogai" == Isogai Takashi <t_isog at hotmail.com>
>>>>>     on Fri, 18 Apr 2003 08:57:15 +0000 writes:

(sorry for the late reaction, your e-mail got buried in a pile...)

    Isogai> Hello everyone.  I just started learning R for
    Isogai> clustering analysis in my research project.  I tried
    Isogai> k-means method and PAM method, both of which were
    Isogai> properly processed with my data.  I have some
    Isogai> questions about PAM graphical output.

    Isogai> Suppose to do the commands shown below;
    Isogai>  pm <- pam(D,6) ;  plot(pm)

    Isogai> I got two charts after prompted.  In the first
    Isogai> chart, 6 oval clusters are drawn together with data
    Isogai> markers.  I see four 'pink' lines that connect oval
    Isogai> clusters.  In this case, oval clusters are located
    Isogai> very near, and some of them are overlapped.  The
    Isogai> line starts from the edge of one oval, and it ends
    Isogai> at the edge of another oval.  Does anyone know the
    Isogai> meaning of this line?  I imagine that the line shows
    Isogai> close linkage of the corresponding clusters, but no
    Isogai> comments regarding this line can be found in the
    Isogai> help documents.

help(clusplot.default) has quite a list of references, one of
them even available on the web.  You should read at least one
(or alternatively look at the R source of these function; this
is open source !)

BTW (to all readers!):

  Use   
	    library(cluster,  keep.source = TRUE)
			      ^^^^^^^^^^^^^^^^^^
  for looking at the source code
  or instead even set

	    options(keep.source.pkgs = TRUE)
 
  in your Rprofile.
      
   
    Isogai> Second question is the meaning of the comment "these
    Isogai> two components explain x% of the point variability"
    Isogai> at the bottom oh the graph.  In my case, the data
    Isogai> has 6 (groups) x 20 (properties) dimension.  I think
    Isogai> that R extract the first and the second factors, and
    Isogai> map them on the graph.  Therefore, the number is the
    Isogai> total contribution of those two factors.  Am I
    Isogai> correct?  If so, how can I choose the factors other
    Isogai> than the first or the second?

help(clusplot[.default]) tells that these are principal components
or MDS coordinates depending on the input.

To choose other than the first two PCs is currently not
possible unless you change the clusplot.default().
It shouldn't be too hard and I will gladly accept patches which
implement such a new feature.
Even more interesting would be to provide other projection like
coordinates (from the literature) for cluster representation!

    Isogai> Lastly, I read a document that says about the
    Isogai> average silhouette, "even that highest width is
    Isogai> below (say) 0.25, one may conclude that no
    Isogai> substantial structure has been found".  Is this
    Isogai> true?  In my case, the value is far below 0.25,
    Isogai> possibly because some clusters overlap on the graph.
    Isogai> I can accept the overlapping clusters from the
    Isogai> viewpoint of my research, but I wonder if the PAM
    Isogai> method is also useful for these clusters.

The reference on clusplot is really Rousseeuw, Struyf and coworkers.
So I assume you read this in one of the references in
help(clusplot.default)  and they do give some indications on
this.  I believe the silhouette widths to be one "goodness of fit"
measure, useful in many but not all cases.  You have to *think*
about what these clusters *mean* in your data situation (and
also about what you really want to achieve with the clustering).
These real questions can never be answered by a single ASW (average
silhouette width) or any other statistic.

    Isogai> Thank you very much for your help in advance.
    Isogai> T. Isog Tokyo, Japan

You're welcome.
Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><



More information about the R-help mailing list