Validating visual clusters in large datasets: Fixed point clusters
of spectral features
Christian Hennig and Norbert Christlieb
January 2002
Abstract
Finding clusters in large datasets is a difficult task. Almost all
computationally feasible methods are related to $k$-means and need a clear
partition structure of the data, while most such datasets contain masking
outliers and other deviations from the usual models of partitioning cluster
analysis. It is possible to
look for clusters informally using graphic tools like
the grand tour, but the meaning and the validity of
such patterns is unclear. In this paper, a three-step-approach is suggested:
In the first step data visualization methods like the grand tour are used
to find cluster candidate subsets of the data. In the second step,
reproducible clusters are generated from them by means of fixed point
clustering, a method to find a single cluster at a time
based on the Mahalanobis distance. In the third step,
the validity of the clusters is assessed by use of classification plots.
The approach is applied to an astronomical dataset of spectra from the
Hamburg/ESO survey.
Download:
Compressed Postscript (811 Kb)
PDF (193 Kb)
Go back to the
Research Reports
from
Seminar für Statistik.