[R] akaike's information criterion
Frank E Harrell Jr
fharrell at virginia.edu
Thu Sep 13 17:19:33 CEST 2001
Especially if you are going to be doing formal statistical
inference but often even just for prediction,
model uncertainty of all types needs to be taken
into account. The use of AIC to select from among
a small set of competing models or to select a single
"tuning constant" such as an overall shrinkage or
penalty factor does not cause many problems. For
what you have suggested, it is possible to be
mislead by unrecognized model uncertainty when
entertaining many models and transformations.
The formula for AIC in many ways assumes that
the model specification was non-stochastic.
See
@ARTICLE{far92cos,
author = {Faraway, J. J.},
year = 1992,
title = {The cost of data analysis},
journal = J Comp Graphical Stat,
volume = 1,
pages = {213-229},
annote = {bootstrap; validation; predictive accuracy; modeling
strategy;
regression diagnostics;model uncertainty}
}
and
@ARTICLE{cha95mod,
author = {Chatfield, C.},
year = 1995,
title = {Model uncertainty, data mining and statistical inference
(with
discussion)},
journal = JRSSA,
volume = 158,
pages = {419-466},
annote = {bias by selecting model because it fits the data well; bias
in
standard errors;P. 420: ... need for a better balance in the
literature and in statistical teaching between {\em
techniques} and
problem solving {\em strategies}. P. 421: It is `well known'
to be
`logically unsound and practically misleading' (Zhang, 1992)
to
make inferences as if a model is known to be true when it
has, in
fact, been selected from the {\em same} data to be used for
estimation purposes. However, although statisticians may
admit this
privately (Breiman (1992) calls it a `quiet scandal'), they
(we)
continue to ignore the difficulties because it is not clear
what
else could or should be done. P. 421: Estimation errors for
regression coefficients are usually smaller than errors from
failing to take into account model specification. P. 422:
Statisticians must stop pretending that model uncertainty
does not
exist and begin to find ways of coping with it. P. 426: It is
indeed strange that we often admit model uncertainty by
searching
for a best model but then ignore this uncertainty by making
inferences and predictions as if certain that the best
fitting
model is actually true. P. 427: The analyst needs to assess
the
model selection {\em process} and not just the best fitting
model.
P. 432: The use of subset selection methods is well known to
introduce alarming biases. P. 433: ... the AIC can be highly
biased
in data-driven model selection situations. P. 434: Prediction
intervals will generally be too narrow. In the discussion,
Jamal R.
M. Ameen states that a model should be (a) satisfactory in
performance relative to the stated objective, (b) logically
sound,
(c) representative, (d) questionable and subject to on-line
interrogation, (e) able to accommodate external or expert
information and (f) able to convey information.}
}
Frank Harrell
Thomas Dick wrote:
>
> Hello all,
>
> i hope you don't mind my off topic question. i want to use the Akaike criterion
> for variable selection in a regression model. Does anyone know some basic
> literature about that topic?
>
> Especially I'm interested in answers to the following questions:
> 1. Has (and if so how has) the criterion to be modified, if i estimate the
> transformations of the variables too?
>
> 2. How is the usage of the criterion if i use dummy variables (for categorical
> data) in the model?
>
> 3. does the criterion have only one minimum, or may i assume several local
> minima?
>
> Thank you in advance
> Thomas
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list