[R] Help with K-Means output

Bill Poling Bill@Poling @ending from zeli@@com
Sat Dec 8 18:23:27 CET 2018


Thank you David I will try that as well.

WHP

From: David L Carlson <dcarlson using tamu.edu>
Sent: Saturday, December 8, 2018 11:12 AM
To: Bert Gunter <bgunter.4567 using gmail.com>; Bill Poling <Bill.Poling using zelis.com>
Cc: R-help <r-help using r-project.org>
Subject: RE: [R] Help with K-Means output

You should also read the manual page for ?split and learn how to work with lists:

# Split the data according to cluster membership
# to create a list of data frames
rr0.clus <- split(rr0, rr0a$cluster)

# The data frame for cluster 1:
rr0.clus[[1]]

--------------------------------------------------------
David L. Carlson
Department of Anthropology
Texas A&M University

-----Original Message-----
From: R-help [mailto:r-help-bounces using r-project.org] On Behalf Of Bert Gunter
Sent: Saturday, December 8, 2018 9:46 AM
To: mailto:Bill.Poling using zelis.com
Cc: R-help <mailto:r-help using r-project.org>
Subject: Re: [R] Help with K-Means output

Please see ?kmeans and note the "cluster" component of the returned value
that would appear to provide the info you seek.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Dec 8, 2018 at 7:03 AM Bill Poling <mailto:Bill.Poling using zelis.com> wrote:

> Good afternoon. I hope I have provided enough info to get my question
> answered.
>
> I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456
>
> When running a K-Means clustering routine is it possible to get the actual
> data from each cluster into a DF?
>
> I have reviewed a number of tutorials and unless I missed it somewhere I
> would like to know if it is possible.
>
> https://www.datacamp.com/community/tutorials/k-means-clustering-r
> https://....guru99..../r-k-means-clustering.html
> https://datascienceplus.com/k-means-clustering-in-r/
> https://datascienceplus.com/finding-optimal-number-of-clusters/
> http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/
> http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/
>
> For example:
>
> I ran the below and get K-means clustering with 10 clusters of sizes 1511,
> 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
> Can the 1511 values of SavingsReversed and ProviderID , 1610 values of
> SavingsReversed and ProviderID, etc.. be run out into DF's?
>
> Thank you for your help.
>
> WHP
>
> str(rr0)
> Classes 'data.table' and 'data.frame':14355 obs. of 2 variables:
> $ SavingsReversed: num 0 0 61 128 160 ...
> $ ProviderID : num 113676 113676 116494 116641 116641 ...
> - attr(*, ".internal.selfref")=<externalptr>
>
> head(rr0, n=35)
> SavingsReversed ProviderID
> 1: 0.00 113676
> 2: 0.00 113676
> 3: 61.00 116494
> 4: 128.25 116641
> 5: 159.60 116641
> 6: 372.66 119316
> 7: 18.79 121319
> 8: 15.64 121319
> 9: 0.00 121319
> 10: 18.79 121319
> 11: 23.00 121319
> 12: 18.79 121319
> 13: 0.00 121319
> 14: 25.86 121319
> 15: 14.00 121319
> 16: 113.00 121545
> 17: 50.00 121545
> 18: 1155.32 121545
> 19: 113.00 121545
> 20: 197.20 121545
> 21: 0.00 121780
> 22: 36.00 122536
> 23: 1171.32 125198
> 24: 1171.32 125198
> 25: 43.00 125303
> 26: 0.00 125881
> 27: 69.64 128435
> 28: 420.18 128435
> 29: 175.18 128435
> 30: 71.54 128435
> 31: 99.85 128435
> 32: 0.00 128435
> 33: 42.75 128435
> 34: 175.18 128435
> 35: 846.45 128435
>
> set.seed(213)
> rr0a <- kmeans(rr0, 10)
> View(rr0a)
> summary(rr0a)
> # Length Class Mode
> # cluster 14355 -none- numeric
> # centers 20 -none- numeric
> # totss 1 -none- numeric
> # withinss 10 -none- numeric
> # tot.withinss 1 -none- numeric
> # betweenss 1 -none- numeric
> # size 10 -none- numeric
> # iter 1 -none- numeric
> # ifault 1 -none- numeric
>
> x1 <- as.data.frame(rr0a$centers)
> sort(x1)
> #SavingsReversed ProviderID
> # 2 75.19665 2773789.2
> # 3 99.31959 4147091.6
> # 5 101.21070 3558532.7
> # 4 103.41147 3893274.4
> # 1 105.38310 2241031.2
> # 8 114.61562 3240701.5
> # 10 121.14184 4718727.6
> # 9 153.70536 4470878.9
> # 6 156.84426 5560636.6
> # 7 185.09745 173732.9
> print(rr0a)
> # K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996,
> 1076, 580, 2429, 728, 3797
> #
> # Cluster means:
> # SavingsReversed ProviderID
> # 1 105.38310 2241031.2
> # 2 75.19665 2773789.2
> # 3 99.31959 4147091.6
> # 4 103.41147 3893274.4
> # 5 101.21070 3558532.7
> # 6 156.84426 5560636.6
> # 7 185.09745 173732.9
> # 8 114.61562 3240701.5
> # 9 153.70536 4470878.9
> # 10 121.14184 4718727.6
> #Within cluster sum of squares by cluster:
> # [1] 74529288379846 25846368411171 4692898666512 6277704963344
> 8428785199973 90824041558798 1468798013919 12143462193009 5483877005233
> # [10] 51547955737867
> # (between_SS / total_SS = 98.7 %)
> #
> # Available components:
> #
> # [1] "cluster" "centers" "totss" "withinss"
> "tot.withinss" "betweenss" "size" "iter" "ifault"
>
>
>
>
>
>
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
> ______________________________________________
> mailto:R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

______________________________________________
mailto:R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}



More information about the R-help mailing list