[R] Help with Kmeans output and using broom to tidy etc..

Mon May 11 17:58:38 CEST 2020

#RStudio Version Version 1.2.1335 need this one--> 1.2.5019
sessionInfo() 
# R version 4.0.0 Patched (2020-05-03 r78349)
#Platform: x86_64-w64-mingw32/x64 (64-bit)
#Running under: Windows 10 x64 (build 17763)

Hello:

I have data that I am trying to manipulate for Kmeans clustering.

Original data looks like this

str(geo1) 
# 'data.frame':	2352 obs. of  5 variables:
# $ ID: Factor w/ 2352 levels "101040199600",..: 590 908 976 509 1674 690 1336 86 726 1702 ...
# $ state           : Factor w/ 41 levels "AL","AR","AZ",..: 32 10 25 11 9 32 13 31 12 12 ...
# $ city            : Factor w/ 1337 levels "ABBOTTSTOWN",..: 932 156 230 698 965 1330 515 727 1127 1304 ...
# $ latitude        : num  40.4 31.2 40.8 42.1 26.8 ...
# $ longitude       : num  -79.9 -81.5 -74 -91.6 -82.1 ...

I created a subset adding column prop_of_total 
str(trnd1_tbl)
tibble [1,457 x 5] (S3: tbl_df/tbl/data.frame)
 $ city         : Factor w/ 1337 levels "ABBOTTSTOWN",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ state        : Factor w/ 41 levels "AL","AR","AZ",..: 32 36 10 28 12 36 10 11 26 38 ...
 $ Basecountsum : num [1:1457] 2352 2352 2352 2352 2352 ...
 $ Basecount2   : num [1:1457] 1 1 1 1 1 2 1 1 2 1 ...
 $ prop_of_total: num [1:1457] 0.000425 0.000425 0.000425 0.000425 0.000425 ...

Then I spread it

trnd2_tbl <- trnd1_tbl %>% 
    dplyr::select(city, state, prop_of_total) %>% 
    spread(key = city, value = prop_of_total, fill = 0) #remove the NA's with fill

str(trnd2_tbl)#tibble [41 x 1,338] (S3: tbl_df/tbl/data.frame)

Then I run a Kmeans

kmeans_obj1 <- trnd2_tbl  %>% 
  dplyr::select(- state) %>% 
  kmeans(centers = 20, nstart = 100)

str(kmeans_obj1)
List of 9
 $ cluster     : int [1:41] 11 11 9 11 11 4 11 11 16 2 ...
 $ centers     : num [1:20, 1:1337] 0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:20] "1" "2" "3" "4" ...
  .. ..$ : chr [1:1337] "ABBOTTSTOWN" "ABILENE" "ACWORTH" "ADAMS" ...
 $ totss       : num 0.00158
 $ withinss    : num [1:20] 0 0 0 0 0 0 0 0 0 0 ...
 $ tot.withinss: num 0.0000848
 $ betweenss   : num 0.0015
 $ size        : int [1:20] 1 1 1 1 1 1 1 1 1 1 ...
 $ iter        : int 3
 $ ifault      : int 0
 - attr(*, "class")= chr "kmeans"

Then I go and try to tidy:

#Tidy, glance, augment
#Just makes it easier to use or view the obj's in the obj list

  broom::tidy(kmeans_obj1) %>% glimpse()

	broom::glance(kmeans_obj1)
##A tibble: 1 x 4
# totss tot.withinss betweenss  iter
# <dbl>        <dbl>     <dbl> <int>
#   1 0.00158    0.0000848   0.00150     3

However, when I run this piece I get an error:

broom::augment(kmeans_obj1, trnd2_tbl) %>% 
  dplyr::select(city, .cluster)             

#Error: Must subset columns with a valid subscript vector.
# The subscript has the wrong type `data.frame<
 # u: double
#  x: double
>`.
i It must be numeric or character.

Here is the back trace:

rlang::last_error()

# Backtrace:
#   1. broom::augment(kmeans_obj1, trnd2_tbl)
# 9. dplyr::select(., city, .cluster)
# 11. tidyselect::vars_select(tbl_vars(.data), !!!enquos(...))
# 12. tidyselect:::eval_select_impl(...)
# 20. tidyselect:::vars_select_eval(...)
# 21. tidyselect:::walk_data_tree(expr, data_mask, context_mask)
# 22. tidyselect:::eval_c(expr, data_mask, context_mask)
# 23. tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
# 24. tidyselect:::walk_data_tree(new, data_mask, context_mask)
# 25. tidyselect:::as_indices_sel_impl(...)
# 26. tidyselect:::as_indices_impl(x, vars, strict = strict)
# 27. vctrs::vec_as_subscript(x, logical = "error")

I am not sure what I am supposed to fix?

Maybe someone has had similar error and can advise me please?

Thank you.

WHP

Proprietary

NOTICE TO RECIPIENT OF INFORMATION:\ This e-mail may con...{{dropped:16}}