[R] Outlier Detection with k-Means

William Dunlap wdunlap at tibco.com
Wed May 7 17:35:18 CEST 2014


Try replacing your order() call with the following 2 lines
    meanClusterRadius <- ave(distances, kmeans.result$cluster,  FUN = mean)
    outliers <- order(distances/meanClusterRadius, decreasing = T)[1:5]
ave(x,group,FUN=fun) applies FUN to the subsets of x defined by the
group argument(s) and puts the results of FUN(x[group[i]]) back into
x[group[i]], returning the modified x.
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, May 7, 2014 at 1:34 AM, marioger <mario_wiegand at gmx.de> wrote:
> Hi,
>
> i am hoping you can help me with my problem. I am trying to detect outliers
> with use of the kmeans algorithm. First I perform the algorithm and choose
> those object as possible outliers which have a big distance to their cluster
> center. Instead of using the absolute distance I want to use the relative
> distance, i.e. the ration of absolute distance of the object to the cluster
> center and the average distance of all objects of the cluster to their
> cluster center. The code for outlier detection based on absolute distance is
> the following:
>
>> # remove species from the data to cluster
>> iris2 <- iris[,1:4]
>> kmeans.result <- kmeans(iris2, centers=3)
>> # cluster centers
>> kmeans.result$centers
>> # calculate distances between objects and cluster centers
>> centers <- kmeans.result$centers[kmeans.result$cluster, ]
>> distances <- sqrt(rowSums((iris2 - centers)^2))
>> # pick top 5 largest distances
>> outliers <- order(distances, decreasing=T)[1:5]
>> # who are outliers
>> print(outliers)
>
> But how can I use the relative instead of the absolute distance to find
> outliers?
> Thanks in advance.
>
> Mario
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Outlier-Detection-with-k-Means-tp4690098.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list