[R] doubts about Silhouette

Wed Oct 17 01:28:52 CEST 2007

Sorry for the long message. I'm doing my best to try to explain myself.

I have fitted a spline to my data, I have fitted a spline, filled in
the missing data by replicating the spline coefficients associated to
the last node. I obtained a number of dendograms by different
combination of distance and link-method by calling DIST and AGNES.
The agglomerative coefficient is very high (~ 0.99) for some
combinations, and is generally around  0.5  for the remaining cases.
As recommended, I ran the SILHOUETTE at different cuts (CUTREE) for
some of the cases. Irregardless of the AC value the highest silhouette
width I get is ~ 0.4  or lower, which is too low a level of confidence
for accepting any clustering structure, from what I have recently
read.

My first question is about the use of CUTREE and SILHOUETTE. I've just
found in the QA archive a statement about CUTREE expecting an object
of type "hclust" as input. Whereas I've always passed an object of
type "agnes". I tried to convert the agnes object to an hclust one and
carried out the silhouette analysis again, but got the same results.
Is such a conversion an important step to get the right answer ?

My second doubt is about the angnes parameter STAND.
I have centralized and standardized the raw data Iinvolved in the
clustering process in advance of fitting a spline. But I have not
standardized the actual data I use for clustering, that is the spline
coefficients, although I have constantly set the STAND parameter to
FALSE.
Is this going to affect all my analysis results that may, as a
consequence, turned out to be misleading  ?

In your experience, whenever the agglomerative hierarchical clustering
approach does not yield a satisfactory level of confidence about the
clustering structure, is it worthwhile to try some Partitioning method
? My question is: if the hierarchical approach does not allow to see
any pattern among the data, is the partitioning method a valuable
alternative ?

If also the Partitioning approach does not reveal any particular data
structure. then probably the raw data have to pre-processed further
before attempting any clustering at all, or I have chosen the wrong
variables in the data set.
I'm somehow trying to cluster signal amplitudes that are not referred
to the same phase values. For instance, if I have two sinusoids
sampled at different phase values like:
{sin(0), sin(1/4PI), sin(3/2PI)}  and  {cos(1/8PI), cos(1/2PI), cos(5/2PI)}
is it meaningful to cluster the two set of values above or shall I
first refer both signals to the same phase in advance of clustering,
like:

{sin(0), sin(1/4PI), sin(3/2PI)}  and  {cos(0I), cos(1/4PI), cos(3/2PI)}

or:

{sin(1/8PI), sin(1/2PI), sin(5/2PI)}  and  {cos(1/8PI), cos(1/2PI), cos(5/2PI)}

Thank you in advance for your attention.

Regards,

-- 
Maura E.M