[R] Help with K-Means output
Bill Poling
Bill@Poling @ending from zeli@@com
Sat Dec 8 16:03:19 CET 2018
Good afternoon. I hope I have provided enough info to get my question answered.
I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456
When running a K-Means clustering routine is it possible to get the actual data from each cluster into a DF?
I have reviewed a number of tutorials and unless I missed it somewhere I would like to know if it is possible.
https://www.datacamp.com/community/tutorials/k-means-clustering-r
https://....guru99..../r-k-means-clustering.html
https://datascienceplus.com/k-means-clustering-in-r/
https://datascienceplus.com/finding-optimal-number-of-clusters/
http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/
http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/
For example:
I ran the below and get K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
Can the 1511 values of SavingsReversed and ProviderID , 1610 values of SavingsReversed and ProviderID, etc.. be run out into DF's?
Thank you for your help.
WHP
str(rr0)
Classes 'data.table' and 'data.frame':14355 obs. of 2 variables:
$ SavingsReversed: num 0 0 61 128 160 ...
$ ProviderID : num 113676 113676 116494 116641 116641 ...
- attr(*, ".internal.selfref")=<externalptr>
head(rr0, n=35)
SavingsReversed ProviderID
1: 0.00 113676
2: 0.00 113676
3: 61.00 116494
4: 128.25 116641
5: 159.60 116641
6: 372.66 119316
7: 18.79 121319
8: 15.64 121319
9: 0.00 121319
10: 18.79 121319
11: 23.00 121319
12: 18.79 121319
13: 0.00 121319
14: 25.86 121319
15: 14.00 121319
16: 113.00 121545
17: 50.00 121545
18: 1155.32 121545
19: 113.00 121545
20: 197.20 121545
21: 0.00 121780
22: 36.00 122536
23: 1171.32 125198
24: 1171.32 125198
25: 43.00 125303
26: 0.00 125881
27: 69.64 128435
28: 420.18 128435
29: 175.18 128435
30: 71.54 128435
31: 99.85 128435
32: 0.00 128435
33: 42.75 128435
34: 175.18 128435
35: 846.45 128435
set.seed(213)
rr0a <- kmeans(rr0, 10)
View(rr0a)
summary(rr0a)
# Length Class Mode
# cluster 14355 -none- numeric
# centers 20 -none- numeric
# totss 1 -none- numeric
# withinss 10 -none- numeric
# tot.withinss 1 -none- numeric
# betweenss 1 -none- numeric
# size 10 -none- numeric
# iter 1 -none- numeric
# ifault 1 -none- numeric
x1 <- as.data.frame(rr0a$centers)
sort(x1)
#SavingsReversed ProviderID
# 2 75.19665 2773789.2
# 3 99.31959 4147091.6
# 5 101.21070 3558532.7
# 4 103.41147 3893274.4
# 1 105.38310 2241031.2
# 8 114.61562 3240701.5
# 10 121.14184 4718727.6
# 9 153.70536 4470878.9
# 6 156.84426 5560636.6
# 7 185.09745 173732.9
print(rr0a)
# K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
#
# Cluster means:
# SavingsReversed ProviderID
# 1 105.38310 2241031.2
# 2 75.19665 2773789.2
# 3 99.31959 4147091.6
# 4 103.41147 3893274.4
# 5 101.21070 3558532.7
# 6 156.84426 5560636.6
# 7 185.09745 173732.9
# 8 114.61562 3240701.5
# 9 153.70536 4470878.9
# 10 121.14184 4718727.6
#Within cluster sum of squares by cluster:
# [1] 74529288379846 25846368411171 4692898666512 6277704963344 8428785199973 90824041558798 1468798013919 12143462193009 5483877005233
# [10] 51547955737867
# (between_SS / total_SS = 98.7 %)
#
# Available components:
#
# [1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size" "iter" "ifault"
Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
More information about the R-help
mailing list