[R] K Means Clustering Weighted by Frequency

Bill.Venables at csiro.au Bill.Venables at csiro.au
Tue Feb 5 23:49:37 CET 2008

kmeans doesn't allow weights.  Since your weights are frequencies,
though, there is a slightly inelegant way of handling it.  You need to
unwind the frequencies and let each point enter the calculation
separately.  (OK, very inelegant!)

A <- a[rep(1:nrow(a), a[, 3]), 1:2]              ### expanded version
km <- kmeans(A, centers = 52)

If sum(a[, 3]) is huge, which is often the case when you go to
frequencies, you may want to trim things a bit and deal with samples
from the lot, but that's another story.

Bill Venables.

Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:                         +61 4 8819 4402
Home Phone:                     +61 7 3286 7700
mailto:Bill.Venables at csiro.au

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Aylward, Jesse
Sent: Wednesday, 6 February 2008 7:17 AM
To: r-help at r-project.org
Subject: [R] K Means Clustering Weighted by Frequency

*Apologies if this is not the right way to ask a question, I'm a first
timer posting here. 

Does anyone have a solution to this?  I'm having trouble figuring out
how to use weighting with K Means Clustering. 

So say if my dataset is: 
Column 1 = x coords 
Column 2 = y coords 
Column 3 = frequency each coordinate occurs 

So I'm basically trying to weight the points more heavily if they occur
more frequently. 

I've been trying 

kmeans(a[,1:2], centers=52, weights=a[,3]) 

It works well before adding in the weights, it also doesn't work with
"weights=c(frequency 1, frequency 2, .)" and a few others I've tried.
Maybe I don't know how to search the previous topics or the software
help well enough yet, but I haven't come across an example that lays out
weighting yet.

Thank you in advance to anyone who has the answer. 


This communication is intended ONLY for the use of the person or entity
named above and may contain information that is confidential or legally
privileged. If you are not the intended recipient named above or a
person responsible for delivering messages or communications to the
intended recipient, YOU ARE HEREBY NOTIFIED that any use, distribution,
or copying of this communication or any of the information contained in
it is strictly prohibited. If you have received this communication in
error, please notify us immediately by telephone and then destroy or
delete this communication, or return it to us by mail if requested by
us. The City of Calgary thanks you for your attention and co-operation.

	[[alternative HTML version deleted]]

R-help at r-project.org mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list