[R-sig-eco] glm-model evaluation

Ben Bolker bolker at ufl.edu
Thu May 29 17:27:38 CEST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Manuel Spínola wrote:
| Dear list members,
|
| I am fitting negative binomial models with the nb.glm function (MASS
| package).
| I ran several models and did model selection using AIC.
| How is a good way to evaluate how good is the selected model (lower AIC
| and considerable Akaike weight)?
| Is model diagnostics a good approach?
| Thank you very much in advance.
|
| Best,
|
| Manuel Spínola
|

~   Manuel,

~  not absolutely sure what your question is.

~  If you're talking about evaluating the relative merit of
the selected model, it's a question of delta-AIC (or delta-AICc),
follow the usual rules of thumb -- <2 is approximately equivalent,
|6 is a lot better, >10 is so good that you can probably discard
worse models.  (See Shane Richards' nice papers on the topic.)

~  If you have several models within delta-AIC of 10 (or 6) of each
other, Burnham and Anderson would say you should really be
averaging model predictions etc. rather than selecting a single
best model.

~  If you're talking about a global goodness-of-fit test, then the
answer's a little bit different.  You should do the global GOF
evaluation on the most-complex model, not a less-complex model
that was selected for having a better AIC.  The standard recipes
for GOF (checking residual deviance etc.) don't work because the
negative binomial soaks up any overdispersion -- these recipes
are geared toward Poisson/binomial data with fixed scale parameters.
You should do the "usual" graphical diagnostic checking on the
most complex model (make sure that relationships are linear on
the scale of the linear predictor, scaled variances are homogeneous,
distributions within groups follow the expected distribution,
no gross outliers or points with large leverage, etc etc etc --
plot(model) will show you a lot of these diagnostics.
However, there isn't a simple way to get a p value for goodness
of the fit of the global model in this case.  (If this is really
important, you can pick a summary statistic, calculate it for
your fitted model, then simulate 'data' from the fitted model many times
and calculate the summary statistics for the simulated data
(which represent the null hypothesis that the data really do
come from the fitted model) and see where your observed
statistic falls in the distribution.)

~    cheers
~     Ben Bolker
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIPstqc5UpGjwzenMRAmspAKCX9MVAOiqTPaXN5NivbpUET+QNYQCfQpV+
l7+79Ne2uY2/z8OG8NDYCZk=
=3iEu
-----END PGP SIGNATURE-----



More information about the R-sig-ecology mailing list