[R] cluster a distance(analogue)-object using agnes(cluster)
Gavin Simpson
gavin.simpson at ucl.ac.uk
Fri Sep 5 15:19:36 CEST 2008
On Thu, 2008-09-04 at 11:28 +0200, Martin Maechler wrote:
> >>>>> "B" == Birgitle <birgit.lemcke at systbot.uzh.ch>
> >>>>> on Tue, 2 Sep 2008 03:02:31 -0700 (PDT) writes:
>
> B> I try to perform a clustering using an existing dissimilarity matrix that I
> B> calculated using distance (analogue)
> B> I tried two different things. One of them worked and one not and I don`t
> B> understand why.
> B> Here the code:
>
> B> not working example
>
> B> library(cluster)
> B> library(analogue)
>
> B> iris2 <- as.data.frame(iris)
>
> why that? After the above, iris2 is identical() to iris !
>
> B> str(iris2)
> B> 'data.frame': 150 obs. of 5 variables:
> B> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
> B> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
> B> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
> B> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
> B> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1
> B> 1 1 1 ...
>
> B> Test.Gower <- distance(iris2, method ="mixed")
>
> why not just
> daisy(iris2, metric = "gower")
>
> daisy() is in cluster which has been a recommended R package
> "forever".
>
> So the solution (here, not in general!)
> is to stay with package 'cluster' and use
> daisy() before agnes().
And as the author of distance(), I agree *completely*, and say so in the
Environmetrics Task View.
distance() was written for a very specific task and that it does what
you want it to, Birgit, is side effect of the way I wrote distance().
Anyway, Birgit, you'll be glad to know that the example you included
works in the R-forge version of analogue (0.5-4 to be). Get it from:
https://r-forge.r-project.org/R/?group_id=69
or directly from within R
install.packages("analogue", repos="http://R-Forge.R-project.org")
if you want to use distance(), but I'd be using the cluster package
tools myself for the problem you emailed about.
You also probably want to use as.dist() on the output from distance() to
store the matrix in a more compact form. Because of how I wanted to
calculate distances (between two data sets), the matrix output was
essential, but this output is storing (and computing) redundant data in
the case you cite (a single matrix).
All the best,
G
>
> Regards,
> Martin Maechler, ETH Zurich
> {same city! feel free to phone me..}
>
> B> Test.Gower.agnes<-agnes(Test.Gower, diss=T)
> B> Fehler in agnes(Test.Gower, diss = T) :
> B> (list) Objekt kann nicht nach 'logical' umgewandelt werden
> B> Error in agnes(Test.Gower, diss=T).
> B> (list) object can`t be transformed to "logical"
>
> B> working example only numerics used:
>
> B> library(cluster)
> B> library(analogue)
>
> B> irisPart<-subset(iris, select= Sepal.Length:Petal.Width)
> B> Dist.Gower <- distance(irisPart, method ="mixed")
> B> AgnesA <- agnes(Dist.Gower, method="average", diss=TRUE)
>
> B> Would be great if somebody could help me.
> B> The dataset that I would like to use for the clustering also contains
> B> factors.
> B> and gives me the same Error message as in the not working example.
>
> B> Thanks in advance
>
> B> B.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list