[R] cluster analysis and supervised classification: an alternative to knn1?

Ulrich Bodenhofer bodenhofer at bioinf.jku.at
Thu May 27 14:30:59 CEST 2010


>
> I had a look at the documentation of the package apcluster.
> That's interesting but do you have any example using it with both
> categorical
> and numerical variables? I'd like to test it with a large dataset..
>
Your posting has opened my eyes: problems where both numerical and
categorical features occur are probably among the most attractive
applications of affinity propagation. So I am considering to include such an
example in a future released.

Here is a very crude example (download the imports-85.data from
http://archive.ics.uci.edu/ml/machine-learning-databases/autos/ first):

> library(cluster)
> library(apcluster)
> automobiles <- read.table("imports-85.data", header=FALSE, sep=",",
> na.strings="?")
> sim <- -as.matrix(daisy(automobiles))
> apcluster(sim)

The most essential part here is to use daisy() from the package "cluster"
for computing distances/similarities. Have a look to the help page of
daisy() to get a better impression how it works and how to tailor the
distance/similarity calculations to your needs.

I do not know whether this is a good data set for clustering. Affinity
propagation produces quite a number of clusters. Maybe fiddling with the
input preferences is necessary (see Section 4 of vignette of package
"apcluster").

Best regards,
Ulrich


-- 
View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2233053.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list