ML-estimation based on mixtures of Normal distributions is a
widely used tool for cluster analysis. However, a single outlier can break
down the parameter estimation of at least one of the mixture
components. Among others, the estimation of mixtures of t-distributions
(McLachlan and Peel, 2000) and the addition of a further mixture component
accounting for "noise" (Fraley and Raftery 1998) were suggested as more
robust alternatives. In this paper, the definition of an adequate
robustness measure for cluster analysis is discussed and bounds on the
breakdown points of the mentioned methods are given. It turns out that the two
alternatives, while adding stability in the presence of outliers of
moderate size, do not possess a substantially
better breakdown behavior than estimation
based on Normal mixtures. If the number of clusters s is treated as
fixed, r additional points suffice for all three methods to let the
parameters of r clusters explode, unless r=s, where this is
not possible for t-mixtures. The ability to estimate the number of
mixture components, e.g., by use of the Bayesian Information Criterion
(Schwarz 1978), and to isolate gross outliers as clusters of one point, is
crucial for a better breakdown behavior of all three
techniques. Furthermore, a sensible restriction of the parameter
space to prevent singularities is discussed and a mixture of Normals with an
improper uniform distribution is proposed for more robustness in the case
of a fixed number of components.
Keywords:
Model-based cluster analysis, robust statistics, mixtures of
t-distributions, Normal mixtures, noise component, classification
breakdown point