# [R] criterion for variable selection in LDA

Torsten Hothorn Torsten.Hothorn at rzmail.uni-erlangen.de
Mon Nov 10 14:15:42 CET 2003

On Mon, 10 Nov 2003, Christoph Lehmann wrote:

> Hi
> Since a stepwise procedure for variable selection (as e.g. in SPSS) for
> a LDA is not implemented in R and anyway I cannot be sure, that all the
> required assumptions for e.g. a procedure using a statistic based on
> wilks' lambda, hold (such as normality and variance homogeneity) I would
> like to ask you, what you would recommend me:
>
> shall I e.g. define a criterion such as the error-rate stemming from a
> leaving-one-out cross-validation and then write my own procedure of
> including/removing variables?
>
> or what would be the golden standard for such a case (my "case" is that
> I have 2 groups (n1=30, n2=15, number of potential variables: 37, no
> equal variance in the two groups))
>

Since you suffer a dimensionality problem, you should consider
discriminant analysis methods that can deal with it. If you are forced to
use a linear discriminant analysis, you can reduce the dimensionality by
computing appropriate data-driven linear scores of the inputs. The
idea is described in

@article{multivaria:1998,
key       = {63},
author    = {J\"urgen L\"auter and Ekkehard Glimm and Siegfried Kropf},
title     = {Multivariate Tests Based on Left-Spherically Distributed
Linear Scores},
journal   = {The Annals of Statistics},
pages     = {1972-1988},
year      = {1998},
volume    = {26},
number    = {5},
note      = {Correction: 1999, Vol. 27, p. 1441}
}

and, focusing on discriminant problems,

@book{stabile-mu:1992,
key       = {337},
author    = {J\"urgen L\"auter},
title     = {Stabile multivariate {V}erfahren: {D}iskriminanzanalyse -
{Regressionsanalyse} - {F}aktoranalyse},
year      = {1992},
}

and

@article{new-multiv:1996,
key       = {66},
author    = {J\"urgen L\"auter and Ekkehard Glimm and Siegfried Kropf},
title     = {New Multivariate Tests for Data with an Inherent
Structure},
journal   = {Biometrical Journal},
pages     = {5-23},
year      = {1996},
volume    = {38},
number    = {1}
}

@book{hochdimens:2000,
key       = {74},
author    = {Siegfried Kropf},
title     = {Hochdimensionale multivariate Verfahren in der
medizinischen Statistik},
year      = {2000},
publisher = {Shaker Verlag},
}

and the function `slda' (ipred package) implements it. If you are free to
choose the methodology, the packages `randomForest' and `gbm' may help...

Best,

Torsten

> many thanks
>
> cheers
>
> christoph
> --
> Christoph Lehmann <christoph.lehmann at gmx.ch>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>