[R] criterion for variable selection in LDA
Torsten Hothorn
Torsten.Hothorn at rzmail.uni-erlangen.de
Mon Nov 10 14:15:42 CET 2003
On Mon, 10 Nov 2003, Christoph Lehmann wrote:
> Hi
> Since a stepwise procedure for variable selection (as e.g. in SPSS) for
> a LDA is not implemented in R and anyway I cannot be sure, that all the
> required assumptions for e.g. a procedure using a statistic based on
> wilks' lambda, hold (such as normality and variance homogeneity) I would
> like to ask you, what you would recommend me:
>
> shall I e.g. define a criterion such as the error-rate stemming from a
> leaving-one-out cross-validation and then write my own procedure of
> including/removing variables?
>
> or what would be the golden standard for such a case (my "case" is that
> I have 2 groups (n1=30, n2=15, number of potential variables: 37, no
> equal variance in the two groups))
>
Since you suffer a dimensionality problem, you should consider
discriminant analysis methods that can deal with it. If you are forced to
use a linear discriminant analysis, you can reduce the dimensionality by
computing appropriate data-driven linear scores of the inputs. The
idea is described in
@article{multivaria:1998,
key = {63},
author = {J\"urgen L\"auter and Ekkehard Glimm and Siegfried Kropf},
title = {Multivariate Tests Based on Left-Spherically Distributed
Linear Scores},
journal = {The Annals of Statistics},
pages = {1972-1988},
year = {1998},
volume = {26},
number = {5},
note = {Correction: 1999, Vol. 27, p. 1441}
}
and, focusing on discriminant problems,
@book{stabile-mu:1992,
key = {337},
author = {J\"urgen L\"auter},
title = {Stabile multivariate {V}erfahren: {D}iskriminanzanalyse -
{Regressionsanalyse} - {F}aktoranalyse},
year = {1992},
publisher = {Akademie Verlag},
address = {Berlin}
}
and
@article{new-multiv:1996,
key = {66},
author = {J\"urgen L\"auter and Ekkehard Glimm and Siegfried Kropf},
title = {New Multivariate Tests for Data with an Inherent
Structure},
journal = {Biometrical Journal},
pages = {5-23},
year = {1996},
volume = {38},
number = {1}
}
@book{hochdimens:2000,
key = {74},
author = {Siegfried Kropf},
title = {Hochdimensionale multivariate Verfahren in der
medizinischen Statistik},
year = {2000},
publisher = {Shaker Verlag},
address = {Aachen}
}
and the function `slda' (ipred package) implements it. If you are free to
choose the methodology, the packages `randomForest' and `gbm' may help...
Best,
Torsten
> many thanks
>
> cheers
>
> christoph
> --
> Christoph Lehmann <christoph.lehmann at gmx.ch>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>
More information about the R-help
mailing list