[R] fuzzy classification and dissimilarity matrix

Mon Apr 10 17:39:48 CEST 2006

 [Replying to myself  :-) ...]

>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Fri, 7 Apr 2006 18:44:22 +0200 writes:

>>>>> "Jeanne" == Jeanne Vallet <Jeanne.Vallet at inh.fr>
>>>>>     on Fri, 7 Apr 2006 12:43:59 +0200 writes:

    Jeanne> Hello, I want to make a fuzzy classification from a
    Jeanne> dissimilarity matrix (calculated with daisy from
    Jeanne> package 'cluster'). I have tried to use fanny
    Jeanne> (package cluster) but I have the same problems than
    Jeanne> described in a previous message
    Jeanne> (http://tolstoy.newcastle.edu.au/R/help/05/05/4546.html)
    Jeanne> i.e. it always gives me two clusters in the results
    Jeanne> (even if k is different from 2) with the same
    Jeanne> memberships for all cluster. One solution suggested
    Jeanne> to the previous message was to use cmeans from
    Jeanne> package e1071. The problem is that it seems that
    Jeanne> cmeans doesn't work with dissimilarity matrix and I
    Jeanne> have to use daisy because I have mixed data.  What
    Jeanne> can I do ?

Thank you, Jeanne, for the example that I've received (off line)
from you.

    MM> This looks like a bug in fanny(), actually somewhere in
    MM> its Fortran code.  

I'm no longer sure if this is a bug, see below

    MM> Following the above URL leads to a
    MM> reproducible example -- eventually! -- from Matthias
    MM> Temple, though a relatively large example.

    MM> I'd like to look at the bug and fix it as soon as
    MM> possible.  If you want you can also send me (privately,
    MM> not via R-help!)  your dissmilarity object {resulting
    MM> from daisy()}, or preferably your data set and the exact
    MM> calls you used {daisy(), then fanny()}, to produce the
    MM> outcome you mention above.

    MM> Regards, Martin Maechler, ETH Zurich

fanny() {from package "cluster"} has had an optional argument
	'memb.exp = 2' for a while now.

?fanny contains

>> Arguments:
>>  
>>   ..................
>>  
>>   memb.exp: number r strictly larger than 1 specifying the _membership
>> 	    exponent_ used in the fit criterion; see the 'Details' below.
>> 	    Default: '2' which used to be hardwired inside FANNY.
>> 
>>   ..................
>> 
>> 
>> Details:
>> 
>>   ..................
>> 
>>      Fanny aims to minimize the objective function
>> 
>>    SUM_[v=1..k] (SUM_(i,j) u(i,v)^r u(j,v)^r d(i,j)) / (2 SUM_j u(j,v)^r)
>> 
>>      where n is the number of observations, k is the number of
>>      clusters, r is the membership exponent 'memb.exp' and d(i,j) is
>>      the dissimilarity between observations i and j. 
>>       Note that r -> 1 gives increasingly crisper clusterings whereas r
>>      -> Inf leads to complete fuzzyness.  K&R(1990), p.191 note that
>>      values too close to 1 can lead to slow convergence.

If you have read this, it may seem pretty natural to try setting
'memb.exp = 1.5' or even 'memb.exp = 1.2' because that is said
to lead to less fuzzy clusterings.

And indeed, both for your example and the one mentioned earlier
in this thread, setting  'memb.exp' to values less than 2, and
particularly for values ``closer to 1 than to 2'',
you get much better results and no longer the 
``all is fuzzy'' result that you head when the memberships
u_{i,j} all where (approximately) 1/k.

For the next version of 'cluster', 
I'll add a bit more to the help page -- *and* a warning + note
about modifying 'memb.exp' for situations like these, where
"complete fuzziness" has resulted.

Martin Maechler,
ETH Zurich