[R-sig-eco] glm-model evaluation

Sun Jun 1 16:19:16 CEST 2008

> We've mostly gotten out of the area where I know enough statistically to
> speak with confidence, but I'll risk some lumps anyway...
>
> I always thought that the idea of retaining a portion of the data for
> validation was a good idea. I asked David Anderson about this personally
> and
> he said he couldn't see any reason to do that. Using likelihood, he
> thought
> the best approach was to use all the data to determine the best model.
>
> I'm pretty muddy on the difference between selecting a good model with AIC
> (which is sometimes referred to as being predictive in nature) and what is
> meant by post-hoc validation of predictive ability (aside from testing on
> another data set). I've often seen the "leave-one-out" approach used to
> "validate" a model. If anyone has a good reference that differentiates the
> two with an example, I'd really appreciate it.

I think it is a matter of principles. In my view statistical inference
theory only covers estimation of parameters and prediction of new data
GIVEN a model, whereas model selection requires a larger theory. The AIC
fits very well in this view since Akaike´s theorem joins statistical
inference theory with information theory. These two theories together
provide the tools to make model selection (or model identification, sensu
Akaike).
I agree with Anderson that I would use always all my data to best fit my
model with the likelihood. Cross-validation is ad hoc whereas the AIC is
grounded on solid theory.
Rubén