[R] Cluster prediction from factor/numeric datasets
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Jul 23 23:25:51 CEST 2007
You can't do Discrimnant Analysis without a quadratic metric in a
Euclidean space. 'Scott Bearer' explicitly does not want to assume that
sort of distance measure.
I am not sure how he used Agnes to form 20 clusters: it forms a
hierarchical clustering, so it really is not possible to predict from the
results of such a clustering (you probably would not even predict the
current cluster membership).
With a methods such as kmeans or PAM, there is a chance to predict: you
allocate new units to the nearest cluster centre. With PAM you can do
this easily by computing a matrix of dissimilarities from new points to
cluster centres and using which.min.
On Mon, 23 Jul 2007, ngottlieb at marinercapital.com wrote:
> Scott:
>
> Suggest you look at using Discrimnant Analysis (don't know which R
> package has it).
> Take the Clusters created, using Discrimnant Analysis, Get Fisher Scores
> for the clusters.
If you mean linear discriminant analysis, package MASS. But there are
many other classification techniques, many preferable to LDA and which
allow non-Euclidean spaces of observations.
> Then you can take new dataset applying fisher scores to see what which
> defined cluster the new dataset
> will be classified into.
>
> Neil
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Scott Bearer
> Sent: Monday, July 23, 2007 1:39 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Cluster prediction from factor/numeric datasets
>
> Hi all,
>
> I have a dataset with numeric and factor columns of data which I
> developed a Gower Dissimilarity Matrix for (Daisy) and used
> Agglomerative Nesting
> (Agnes) to develop 20 clusters.
>
> I would like to use the 20 clusters to determine cluster membership for
> a new dataset (using predict) but cannot find a way to do this (no way
> to "predict" in the cluster package).
>
> I know I can use "predict" in cclust, kcca, and flexclust- but these
> algorithms do not permit factor data or use a Gower dissimilarity
> matrix, so are unusable to me.
>
> Any suggestions?
>
> Thanks in advance,
>
> Scott
>
> Scott Bearer, Ph.D.
> Forest Ecologist
> The Nature Conservancy
> in Pennsylvania
> Community Arts Center
> 220 West Fourth Street, 3rd Floor
> Williamsport, PA 17701
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> --------------------------------------------------------
>
>
>
> This information is being sent at the recipient's request or...{{dropped}}
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list