[R] fuzzy classification and dissimilarity matrix
Martin Maechler
maechler at stat.math.ethz.ch
Mon Apr 10 17:39:48 CEST 2006
[Replying to myself :-) ...]
>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Fri, 7 Apr 2006 18:44:22 +0200 writes:
>>>>> "Jeanne" == Jeanne Vallet <Jeanne.Vallet at inh.fr>
>>>>> on Fri, 7 Apr 2006 12:43:59 +0200 writes:
Jeanne> Hello, I want to make a fuzzy classification from a
Jeanne> dissimilarity matrix (calculated with daisy from
Jeanne> package 'cluster'). I have tried to use fanny
Jeanne> (package cluster) but I have the same problems than
Jeanne> described in a previous message
Jeanne> (http://tolstoy.newcastle.edu.au/R/help/05/05/4546.html)
Jeanne> i.e. it always gives me two clusters in the results
Jeanne> (even if k is different from 2) with the same
Jeanne> memberships for all cluster. One solution suggested
Jeanne> to the previous message was to use cmeans from
Jeanne> package e1071. The problem is that it seems that
Jeanne> cmeans doesn't work with dissimilarity matrix and I
Jeanne> have to use daisy because I have mixed data. What
Jeanne> can I do ?
Thank you, Jeanne, for the example that I've received (off line)
from you.
MM> This looks like a bug in fanny(), actually somewhere in
MM> its Fortran code.
I'm no longer sure if this is a bug, see below
MM> Following the above URL leads to a
MM> reproducible example -- eventually! -- from Matthias
MM> Temple, though a relatively large example.
MM> I'd like to look at the bug and fix it as soon as
MM> possible. If you want you can also send me (privately,
MM> not via R-help!) your dissmilarity object {resulting
MM> from daisy()}, or preferably your data set and the exact
MM> calls you used {daisy(), then fanny()}, to produce the
MM> outcome you mention above.
MM> Regards, Martin Maechler, ETH Zurich
fanny() {from package "cluster"} has had an optional argument
'memb.exp = 2' for a while now.
?fanny contains
>> Arguments:
>>
>> ..................
>>
>> memb.exp: number r strictly larger than 1 specifying the _membership
>> exponent_ used in the fit criterion; see the 'Details' below.
>> Default: '2' which used to be hardwired inside FANNY.
>>
>> ..................
>>
>>
>> Details:
>>
>> ..................
>>
>> Fanny aims to minimize the objective function
>>
>> SUM_[v=1..k] (SUM_(i,j) u(i,v)^r u(j,v)^r d(i,j)) / (2 SUM_j u(j,v)^r)
>>
>> where n is the number of observations, k is the number of
>> clusters, r is the membership exponent 'memb.exp' and d(i,j) is
>> the dissimilarity between observations i and j.
>> Note that r -> 1 gives increasingly crisper clusterings whereas r
>> -> Inf leads to complete fuzzyness. K&R(1990), p.191 note that
>> values too close to 1 can lead to slow convergence.
If you have read this, it may seem pretty natural to try setting
'memb.exp = 1.5' or even 'memb.exp = 1.2' because that is said
to lead to less fuzzy clusterings.
And indeed, both for your example and the one mentioned earlier
in this thread, setting 'memb.exp' to values less than 2, and
particularly for values ``closer to 1 than to 2'',
you get much better results and no longer the
``all is fuzzy'' result that you head when the memberships
u_{i,j} all where (approximately) 1/k.
For the next version of 'cluster',
I'll add a bit more to the help page -- *and* a warning + note
about modifying 'memb.exp' for situations like these, where
"complete fuzziness" has resulted.
Martin Maechler,
ETH Zurich
More information about the R-help
mailing list