Symmetric, asymmetric, and robust linear dimension reduction for classification

Christian Hennig

October 2002

Abstract

This paper discusses nine methods to project a p-dimensional dataset with classified points from $s$ known classes onto a lower dimensional hyperplane so that the classes appear optimally separated. Such projections can be used, e.g., for data visualization and classification in lower dimensions. Classical discriminant coordinates are discussed as well as methods maximizing mean and variance differences between classes. New methods, which are asymmetric with respect to the numbering of the groups, are introduced for s=2. They aim at generating data projections where one class is homogeneous and optimally separated from the other class, while the other class may be widespread. Neighborhood based methods are also investigated, where local information about the separation of the classes is averaged. The use of robust MCD-covariance matrices is suggested. The resulting methods are compared by a simulation study and applied to a 12-dimensional dataset of 74159 spectra of stellar objects.

AMS 2000 subject classification: 62-09, 62-07, 62H30
Keywords: visualization, discriminant coordinates, canonical coordinates, nearest neighbor, projection pursuit, cluster validation, MCD estimator

Compressed Postscript (269 Kb)
PDF (725 Kb)


Go back to the Research Reports from Seminar für Statistik.