[R] Find the ideal cluster
David Carlson
dc@r|@on @end|ng |rom t@mu@edu
Sat Dec 12 16:49:46 CET 2020
Look at the Cluster Analysis Task View, particularly section
"Additional Functionality"
(https://cran.r-project.org/web/views/Cluster.html)
Maybe package clValid:
The R package clValid contains functions for validating the results
of a clustering analysis. There are three main types of cluster
validation measures available, “internal”, “stability”, and
“biological”. The user can choose from nine clustering algorithms in
existing R packages, including hierarchical, K-means, self-organizing
maps (SOM), and model based clustering. In addition, we provide a
function to perform the self-organizing tree algorithm (SOTA) method
of clustering. Any combination of validation measures and clustering
methods can be requested in a single function call. This allows the
user to simultaneously evaluate several clustering algorithms while
varying the number of clusters, to help determine the most appropriate
method and number of clusters for the dataset of interest.
Additionally, the package can automatically make use of the biological
information contained in the Gene Ontology (GO) database to calculate
the biological validation measures, via the annotation packages
available in Bioconductor. The function returns an object of S4 class
clValid, which has summary, plot, print, and additional methods which
allow the user to display the optimal validation scores and extract
clustering results.
David L Carlson
Professor Emeritus
Texas A&M University
On Sat, Dec 12, 2020 at 9:27 AM Jovani T. de Souza
<jovanisouza5 using gmail.com> wrote:
>
> So, I and some other colleagues developed a hierarchical clustering
> algorithm to basically find the main clusters involving agricultural
> industries according to a particular city (e.g. London city).. We
> structured this algorithm in R. It is working perfectly. So, according to
> our filters that we inserted in the algorithm, we were able to generate 6
> clustering scenarios to London city. For example, the first scenario
> generated 2 clusters, the second scenario 5 clusters, and so on. I would
> therefore like some help on how I can choose the most appropriate one. I
> saw that there are some packages that help in this process, like `pvclust`,
> but I couldn't use it for my case. I am inserting a brief executable code
> below to show the essence of what I want.
>
> Any help is welcome! If you know how to use using another package, feel
> free to describe.
>
> Best Regards.
>
>
> library(rdist)
> library(geosphere)
> library(fpc)
>
>
> df<-structure(list(Industries = c(1,2,3,4,5,6),
> + Latitude = c(-23.8, -23.8, -23.9, -23.7,
> -23.7,-23.7),
> + Longitude = c(-49.5, -49.6, -49.7, -49.8,
> -49.6,-49.9),
> + Waste = c(526, 350, 526, 469, 534, 346)), class =
> "data.frame", row.names = c(NA, -6L))
>
> df1<-df
>
> #clusters
> coordinates<-df[c("Latitude","Longitude")]
> d<-as.dist(distm(coordinates[,2:1]))
> fit.average<-hclust(d,method="average")
>
> clusters<-cutree(fit.average, k=2)
> df$cluster <- clusters
> > df
> Industries Latitude Longitude Waste cluster
> 1 1 -23.8 -49.5 526 1
> 2 2 -23.8 -49.6 350 1
> 3 3 -23.9 -49.7 526 1
> 4 4 -23.7 -49.8 469 2
> 5 5 -23.7 -49.6 534 1
> 6 6 -23.7 -49.9 346 2
> >
> clusters1<-cutree(fit.average, k=5)
> df1$cluster <- clusters1
> > df1
> Industries Latitude Longitude Waste cluster
> 1 1 -23.8 -49.5 526 1
> 2 2 -23.8 -49.6 350 1
> 3 3 -23.9 -49.7 526 2
> 4 4 -23.7 -49.8 469 3
> 5 5 -23.7 -49.6 534 4
> 6 6 -23.7 -49.9 346 5
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!Tws97pkPo-5PwOFVXUKnAB17jy4Wop-N5HsB9u3NBOLATWcys9Qz_h8zZmhqq5I$
> PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!Tws97pkPo-5PwOFVXUKnAB17jy4Wop-N5HsB9u3NBOLATWcys9Qz_h8zUffJHwg$
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list