[R] Clustering of datasets
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Mon Sep 5 15:02:35 CEST 2022
Hello,
I am not at all sure that the following answers the question.
The code below ries to find the optimal number of clusters. One of the
changes I have made to your call to kmeans is to subset DMs not dropping
the dim attribute.
library(cluster)
max_clust <- 10
wss <- numeric(max_clust)
for(k in 1:max_clust) {
km <- kmeans(DMs[,2], centers = k, nstart = 25)
wss[k] <- km$tot.withinss
}
plot(wss, type = "b")
dm <- DMs[, 2, drop = FALSE]
# Where is the elbow, at 2 or at 4?
factoextra::fviz_nbclust(dm, kmeans, method = "wss")
factoextra::fviz_nbclust(dm, kmeans, method = "silhouette")
k2 <- kmeans(dm, centers = 2, nstart = 25)
k3 <- kmeans(dm, centers = 3, nstart = 25)
k4 <- kmeans(dm, centers = 4, nstart = 25)
main2 <- paste(length(k2$centers), "clusters")
main3 <- paste(length(k3$centers), "clusters")
main4 <- paste(length(k4$centers), "clusters")
old_par <- par(mfcol = c(1, 3))
plot(DMs[,2], col = k2$cluster, pch = 19, main = main2)
plot(DMs[,2], col = k3$cluster, pch = 19, main = main3)
plot(DMs[,2], col = k4$cluster, pch = 19, main = main4)
par(old_par)
Hope this helps,
Rui Barradas
Às 12:31 de 05/09/2022, Subhamitra Patra escreveu:
> Dear all,
>
> I am about to cluster my datasets by using K-mean clustering techniques in
> R, but getting some type of scattered results. Herewith I pasted my code
> below. Please suggest to me where I am lacking in my code. I was pasting my
> data before applying the K-mean method as follows.
>
> DMs<-read.table(text="Country DATA
> IS -0.0092
> BA -0.0235
> HK -0.0239
> JA -0.0333
> KU -0.0022
> OM -0.0963
> QA -0.0706
> SK -0.0322
> SA -0.1233
> SI -0.0141
> TA -0.0142
> UAE -0.0656
> AUS -0.0230
> BEL -0.0006
> CYP -0.0085
> CR -0.0398
> DEN -0.0423
> EST -0.0604
> FIN -0.0227
> FRA -0.0085
> GER -0.0272
> GrE -0.3519
> ICE -0.0210
> IRE -0.0057
> LAT -0.0595
> LITH -0.0451
> LUXE -0.0023
> MAL -0.0351
> NETH -0.0048
> NOR -0.0495
> POL -0.0081
> PORT -0.0044
> SLOVA -0.1210
> SLOVE -0.0031
> SPA -0.0213
> SWE -0.0106
> SWIT -0.0152
> UK -0.0030
> HUNG -0.0086
> CAN -0.0144
> CHIL -0.0078
> USA -0.0042
> BERM -0.0035
> AUST -0.0211
> NEWZ -0.0538" ,
> header = TRUE,stringsAsFactors=FALSE)
> library(cluster)
> k1<-kmeans(DMs[,2],centers=2,nstart=25)
> plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0))
> text(1:46,DMs[,2],DMs[,1],col=k1$cluster)
> legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less Integrated"),
> col=1:2,pch=19)
>
>
More information about the R-help
mailing list