[R] agnes() in package cluster on R 2.14.1 and R 3.0.1
Martin Maechler
maechler at stat.math.ethz.ch
Wed Jun 12 14:59:48 CEST 2013
>>>>> Hugo Varet <varethugo at gmail.com>
>>>>> on Tue, 11 Jun 2013 15:15:36 +0200 writes:
> Dear Martin,
> Thank you for your answer. Here is the exact call to agnes():
> setwd("E:/Hugo")
> library(cluster)
> load("mydata.rda")
> tableauTani<-dist.binary(mydata, method = 4, diag = FALSE, upper = FALSE)
> resAgnes.Tani<-agnes(tableauTani, diss = inherits(tableauTani,
> "dist"),method = "ward")
> classe.agnTani.3 <- cutree(resAgnes.Tani, 3)
> I'm going to send you the data in a separated e-mail.
Thank you, Hugo, and I got that alright.
I can see that many of the distances are *identical*, because
your data is completely binary.
>From experience, I know that this can lead (for some algorithms)
to "arbitrary" decisions in clustering, namely when two
*pairs* of observations / clusters have exactly the same
distance, it is somewhat random which of the pair is "merged" /
"fused" first, in a bottom up hierarchical algorithm such as agnes().
To reproduce your example (above) I need however to know
*where* you got the the dist.binary() function from.
It is not part of standard R nor of the cluster package.
Regards,
Martin
> Regards,
> Hugo
> Le lundi 10 juin 2013, Martin Maechler <maechler at stat.math.ethz.ch> a
> écrit :
>>>>>>> Hugo Varet <varethugo at gmail.com>
>>>>>>> on Sun, 9 Jun 2013 11:43:32 +0200 writes:
>>
>> > Dear R users,
>> > I discovered something strange using the function agnes() of the
> cluster
>> > package on R 3.0.1 and on R 2.14.1. Indeed, the clusterings
> obtained are
>> > different whereas I ran exactly the same code.
>>
>> hard to believe... but ..
>>
>> > I quickly looked at the source code of the function and I
> discovered that
>> > there was an important change: agnes() in R 2.14.1 used a FORTRAN
> code
>> > whereas agnes() in R 3.0.1 uses a C code.
>>
>> well, it does so quite a bit longer, e.g., also in R 2.15.0
>>
>> > Here is one of the contingency table between R 2.14.1 and R 3.0.1:
>> > classe.agnTani.2.14.1
>> > classe.agnTani.3.0.1 1 2 3
>> > 1 74 0 229
>> > 2 0 235 0
>> > 3 120 0 15
>>
>> > So, I was wondering if it was normal that the C and FORTRAN codes
> give
>> > different results?
>>
>> It's not normal, and I'm pretty sure I have had many many
>> examples which gave identical results.
>>
>> Can you provide a reproducible example, please?
>> If the example is too large [for dput() ], please send me the *.rda
>> file produced from
>> save(<your data>, file=<the file I neeed>)
>> *and* a the exact call to agnes() for your data.
>>
>> Thank you in advance!
>>
>> Martin Maechler,
>> the one you could have e-mailed directly
>> to using maintainer("cluster") ...
>>
>>
>> > Best regards,
>> > Hugo Varet
>>
>> > [[alternative HTML version deleted]]
>> ^^^^^^^^^^^^^ try to avoid, please ^^^^^^^^^^^^^^^^^
>>
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> yes indeed, please.
>>
More information about the R-help
mailing list