[R-sig-eco] glm-model evaluation

Thu May 29 20:13:01 CEST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Manuel Spínola wrote:
| Thank you very much Ben.
| Yes, my question is after the model selection procedure (after using
| AIC).  It is my understanding that modeling it doesn't finish on finding
| the best model using IT methods, you should also see if the selected
| model is a good model, right?

~  Yes, but ... any formal goodness-of-fit examination should
really be done on the full (most complex) model, before trying
to select a model.  The idea is that, if the most complex model
is a reasonable fit, then any simpler models that are selected
will fit adequately (because if they didn't fit adequately,
*relative to the most complex model*, they wouldn't be selected ...)

~  That said, there's no reason not to look at the selected
model for adequacy as well -- just that it would be very surprising
if the full model were adequate and the reduced one weren't.

|
| In you message you wrote:
|
| "make sure that relationships are linear on the scale of the linear
| predictor, scaled variances are homogeneous"
|
| what do you mean "on the scale of the linear predictor"?  How can I do
| this in R? What if one of my variable is categorical?

~  This comment only applies to continuous predictors.  "On the scale of
the linear predictor" means on the logged scale (in the case of a
negative binomial model, which usually has a log link).
|
| Also, what do you mean that "scaled variance are homogeneous"? Is that
| an assumption for glm?

~   GLM assumes that there is a single negative binomial k parameter
that covers all groups.  So the variance scaled by the expected variance
for a given group (given this same neg binomial k) should be about
the same in all groups.

|
| Is there any other consideration when using negative binomial models.  I
| decided for this type of model because I have overdispersion in the
| Poisson models.

~  Good choice (although Burnham and Anderson 2002 say that
quasi-likelihood approaches usually work just fine, I prefer
NB models where they're feasible).

| Thank you again Ben.
| Best,
|
| Manuel

~  Here's a little simulation to play with ... try different
random-number seeds to see how big the deviations are when
the model is correct.

set.seed(1002)
f <- factor(rep(c("a","b"),each=40))
x <- runif(80)
b <- c(1,3)
eta <- b[f]+2*x
y <- rnbinom(80,mu=exp(eta),size=0.5)
library(lattice)
xyplot(jitter(y+0.1,0.1)~x|f,scales=list(y=list(log=TRUE)))
library(MASS)
m = glm.nb(y~x+f)
plot(m)

|
|
| Ben Bolker escribió:
| Manuel Spínola wrote:
| | Dear list members,
| |
| | I am fitting negative binomial models with the nb.glm function (MASS
| | package).
| | I ran several models and did model selection using AIC.
| | How is a good way to evaluate how good is the selected model (lower AIC
| | and considerable Akaike weight)?
| | Is model diagnostics a good approach?
| | Thank you very much in advance.
| |
| | Best,
| |
| | Manuel Spínola
| |
|
| ~   Manuel,
|
| ~  not absolutely sure what your question is.
|
| ~  If you're talking about evaluating the relative merit of
| the selected model, it's a question of delta-AIC (or delta-AICc),
| follow the usual rules of thumb -- <2 is approximately equivalent,
| |6 is a lot better, >10 is so good that you can probably discard
| worse models.  (See Shane Richards' nice papers on the topic.)
|
| ~  If you have several models within delta-AIC of 10 (or 6) of each
| other, Burnham and Anderson would say you should really be
| averaging model predictions etc. rather than selecting a single
| best model.
|
| ~  If you're talking about a global goodness-of-fit test, then the
| answer's a little bit different.  You should do the global GOF
| evaluation on the most-complex model, not a less-complex model
| that was selected for having a better AIC.  The standard recipes
| for GOF (checking residual deviance etc.) don't work because the
| negative binomial soaks up any overdispersion -- these recipes
| are geared toward Poisson/binomial data with fixed scale parameters.
| You should do the "usual" graphical diagnostic checking on the
| most complex model (make sure that relationships are linear on
| the scale of the linear predictor, scaled variances are homogeneous,
| distributions within groups follow the expected distribution,
| no gross outliers or points with large leverage, etc etc etc --
| plot(model) will show you a lot of these diagnostics.
| However, there isn't a simple way to get a p value for goodness
| of the fit of the global model in this case.  (If this is really
| important, you can pick a summary statistic, calculate it for
| your fitted model, then simulate 'data' from the fitted model many times
| and calculate the summary statistics for the simulated data
| (which represent the null hypothesis that the data really do
| come from the fitted model) and see where your observed
| statistic falls in the distribution.)
|
| ~    cheers
| ~     Ben Bolker
|>
|>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIPvItc5UpGjwzenMRAjkhAJ9j/jJ4geGE63sAgG7CN2nAORlDpwCfctYj
zxBvt6FF4CdRgrx4zQ86WR8=
=ME6E
-----END PGP SIGNATURE-----