[R] Help with K-Means output

Sat Dec 8 16:03:19 CET 2018

Good afternoon. I hope I have provided enough info to get my question answered.

I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456

When running a K-Means clustering routine is it possible to get the actual data from each cluster into a DF?

I have reviewed a number of tutorials and unless I missed it somewhere I would like to know if it is possible.

https://www.datacamp.com/community/tutorials/k-means-clustering-r
https://....guru99..../r-k-means-clustering.html
https://datascienceplus.com/k-means-clustering-in-r/
https://datascienceplus.com/finding-optimal-number-of-clusters/
http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/
http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/

For example:

I ran the below and get K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
Can the 1511 values of SavingsReversed and ProviderID , 1610 values of SavingsReversed and ProviderID, etc.. be run out into DF's?

Thank you for your help.

WHP

str(rr0)
Classes 'data.table' and 'data.frame':14355 obs. of  2 variables:
 $ SavingsReversed: num  0 0 61 128 160 ...
 $ ProviderID     : num  113676 113676 116494 116641 116641 ...
 - attr(*, ".internal.selfref")=<externalptr>

head(rr0, n=35)
    SavingsReversed ProviderID
 1:            0.00     113676
 2:            0.00     113676
 3:           61.00     116494
 4:          128.25     116641
 5:          159.60     116641
 6:          372.66     119316
 7:           18.79     121319
 8:           15.64     121319
 9:            0.00     121319
10:           18.79     121319
11:           23.00     121319
12:           18.79     121319
13:            0.00     121319
14:           25.86     121319
15:           14.00     121319
16:          113.00     121545
17:           50.00     121545
18:         1155.32     121545
19:          113.00     121545
20:          197.20     121545
21:            0.00     121780
22:           36.00     122536
23:         1171.32     125198
24:         1171.32     125198
25:           43.00     125303
26:            0.00     125881
27:           69.64     128435
28:          420.18     128435
29:          175.18     128435
30:           71.54     128435
31:           99.85     128435
32:            0.00     128435
33:           42.75     128435
34:          175.18     128435
35:          846.45     128435

set.seed(213)
rr0a <- kmeans(rr0, 10)
View(rr0a)
summary(rr0a)
# Length Class  Mode
# cluster      14355  -none- numeric
# centers         20  -none- numeric
# totss            1  -none- numeric
# withinss        10  -none- numeric
# tot.withinss     1  -none- numeric
# betweenss        1  -none- numeric
# size            10  -none- numeric
# iter             1  -none- numeric
# ifault           1  -none- numeric

x1 <- as.data.frame(rr0a$centers)
sort(x1)
#SavingsReversed ProviderID
# 2         75.19665  2773789.2
# 3         99.31959  4147091.6
# 5        101.21070  3558532.7
# 4        103.41147  3893274.4
# 1        105.38310  2241031.2
# 8        114.61562  3240701.5
# 10       121.14184  4718727.6
# 9        153.70536  4470878.9
# 6        156.84426  5560636.6
# 7        185.09745   173732.9
print(rr0a)
# K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
#
# Cluster means:
#   SavingsReversed ProviderID
# 1        105.38310  2241031.2
# 2         75.19665  2773789.2
# 3         99.31959  4147091.6
# 4        103.41147  3893274.4
# 5        101.21070  3558532.7
# 6        156.84426  5560636.6
# 7        185.09745   173732.9
# 8        114.61562  3240701.5
# 9        153.70536  4470878.9
# 10       121.14184  4718727.6
#Within cluster sum of squares by cluster:
# [1] 74529288379846 25846368411171  4692898666512  6277704963344  8428785199973 90824041558798  1468798013919 12143462193009  5483877005233
# [10] 51547955737867
# (between_SS / total_SS =  98.7 %)
#
# Available components:
#
#   [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"         "iter"         "ifault"

Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}