[R] Fwd: problem with kmeans
Peter Langfelder
peter.langfelder at gmail.com
Tue Apr 29 06:44:25 CEST 2014
You are using the wrong algorithm. You want Partitioning around
Medoids (PAM, function pam), not k-means. PAM is also known as
k-medoids, which is where the confusion may come from.
use
library(cluster)
cl = pam(dis, 4)
and see if you get what you want.
HTH,
Peter
On Mon, Apr 28, 2014 at 9:15 PM, cassie jones <cassiejones26 at gmail.com> wrote:
> Dear R-users,
>
> I am trying to run kmeans on a set comprising of 100 observations. But R
> somehow can not figure out the true underlying groups, although other
> software such as Jmp, MINITAB are producing the desired result.
>
> Following is a brief example of what I am doing.
>
> library(stringdist)
> test=c('hematolgy','hemtology','oncology','onclogy',
> 'oncolgy','dermatolgy','dermatoloy','dematology',
> 'neurolog','nerology','neurolgy','nerology')
>
> dis=stringdistmatrix(test,test, method = "lv")
>
> set.seed(123)
> cl=kmeans(dis,4)
>
>
> grp_cl=vector('list',4)
>
> for(i in 1:4)
> {
> grp_cl[[i]]=test[which(cl$cluster==i)]
> }
> grp_cl
>
> [[1]]
> [1] "oncology" "onclogy"
>
> [[2]]
> [1] "neurolog" "nerology" "neurolgy" "nerology"
>
> [[3]]
> [1] "oncolgy"
>
> [[4]]
> [1] "hematolgy" "hemtology" "dermatolgy" "dermatoloy" "dematology"
>
> In the above example, the 'test' variable consists of a set of
> terminologies with various typos and I am trying to group the similar types
> of words based on their string distance. Unfortunately kmeans is not able
> to replicate the following result that the other software are able to
> produce.
> [[1]]
> [1] "oncology" "onclogy" "oncolgy"
>
> [[2]]
> [1] "neurolog" "nerology" "neurolgy" "nerology"
>
> [[3]]
> [1] "dermatolgy" "dermatoloy" "dematology"
>
> [[4]]
> [1] "hematolgy" "hemtology"
>
>
> Does anyone know if there is a way out, I have heard from a lot of people
> that multivariate analysis in R does not produce the desired result most of
> the time. Any help is really appreciated.
>
>
> Thanks in advance.
>
>
> Cassie
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list