[R-sig-eco] glm-model evaluation

Sat May 31 00:25:21 CEST 2008

On Thu, May 29, 2008 at 4:51 PM, David Hewitt <dhewitt37 at gmail.com> wrote:
>
>> I'd add that showing predictive ability is very important if the goal of
>> the modeling process is to make predictions (and even if it's not, showing
>> predictive ability provides support for the model).  Frank Harrell has
>> tools in the Design library for efficient internal validation and
>> calibration via the bootstrap (see the 'validate' and 'calibrate'
>> functions) but these will not work on a model produced by glm.nb.  However
>> it's easy to code a cross-validation in R and I believe MASS shows a
>> 10-fold cross-validation for the CPUs example.
>>
>
> IIRC, there's a section in B&A (2002) that points out and demonstrates that
> AIC model selection has the property of being equivalent to "leave one out"
> cross-validation. It draws from an original work by Stone (197x ??). They
> also discuss more involved simulation-based (bootstrap) methods for complex
> models.
>

I think that's a different (though not unrelated) issue -- namely,
model selection.  Asymptotically, AIC is equivalent to leave-one-out
cross validation, Mallow's Cp, and some other methods for model
selection.  However I don't see using a model selection method as
equivalent to validating the predictive ability of a model.

As far as how to show predictive ability - I think that's context
dependent. Along with various quantitative measures, I've found
plotting to be useful.  For example, for each fold of a k-fold cross
validation plotting the observed vs predicted in a scatter plot, using
color to identify an important categorical variable (e.g. sex,
species, region etc.) and pch to identify another.  Or, if it's
spatial data actually mapping the RMSEs of the cross validations to
get an idea of where the model is performing  well/poorly.
Conditional plots and parallel coordinate plots can be good tools for
these types of 'validation' as well. One thing to remember -- if these
methods are used as part of the model selection process there should
be a final hold-out dataset that was never used in any way in making
modeling decisions.  This is a luxury, but if there's enough data it
can provide strong evidence for the models' predictive traits.

Kingsford

> -----
> David Hewitt
> Research Fishery Biologist
> USGS Klamath Falls Field Station (USA)
> --
> View this message in context: http://www.nabble.com/glm-model-evaluation-tp17525503p17548602.html
> Sent from the r-sig-ecology mailing list archive at Nabble.com.
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>