[R-sig-eco] glm-model evaluation
Ben Bolker
bolker at ufl.edu
Thu May 29 20:13:01 CEST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Manuel Spínola wrote:
| Thank you very much Ben.
| Yes, my question is after the model selection procedure (after using
| AIC). It is my understanding that modeling it doesn't finish on finding
| the best model using IT methods, you should also see if the selected
| model is a good model, right?
~ Yes, but ... any formal goodness-of-fit examination should
really be done on the full (most complex) model, before trying
to select a model. The idea is that, if the most complex model
is a reasonable fit, then any simpler models that are selected
will fit adequately (because if they didn't fit adequately,
*relative to the most complex model*, they wouldn't be selected ...)
~ That said, there's no reason not to look at the selected
model for adequacy as well -- just that it would be very surprising
if the full model were adequate and the reduced one weren't.
|
| In you message you wrote:
|
| "make sure that relationships are linear on the scale of the linear
| predictor, scaled variances are homogeneous"
|
| what do you mean "on the scale of the linear predictor"? How can I do
| this in R? What if one of my variable is categorical?
~ This comment only applies to continuous predictors. "On the scale of
the linear predictor" means on the logged scale (in the case of a
negative binomial model, which usually has a log link).
|
| Also, what do you mean that "scaled variance are homogeneous"? Is that
| an assumption for glm?
~ GLM assumes that there is a single negative binomial k parameter
that covers all groups. So the variance scaled by the expected variance
for a given group (given this same neg binomial k) should be about
the same in all groups.
|
| Is there any other consideration when using negative binomial models. I
| decided for this type of model because I have overdispersion in the
| Poisson models.
~ Good choice (although Burnham and Anderson 2002 say that
quasi-likelihood approaches usually work just fine, I prefer
NB models where they're feasible).
| Thank you again Ben.
| Best,
|
| Manuel
~ Here's a little simulation to play with ... try different
random-number seeds to see how big the deviations are when
the model is correct.
set.seed(1002)
f <- factor(rep(c("a","b"),each=40))
x <- runif(80)
b <- c(1,3)
eta <- b[f]+2*x
y <- rnbinom(80,mu=exp(eta),size=0.5)
library(lattice)
xyplot(jitter(y+0.1,0.1)~x|f,scales=list(y=list(log=TRUE)))
library(MASS)
m = glm.nb(y~x+f)
plot(m)
|
|
| Ben Bolker escribió:
| Manuel Spínola wrote:
| | Dear list members,
| |
| | I am fitting negative binomial models with the nb.glm function (MASS
| | package).
| | I ran several models and did model selection using AIC.
| | How is a good way to evaluate how good is the selected model (lower AIC
| | and considerable Akaike weight)?
| | Is model diagnostics a good approach?
| | Thank you very much in advance.
| |
| | Best,
| |
| | Manuel Spínola
| |
|
| ~ Manuel,
|
| ~ not absolutely sure what your question is.
|
| ~ If you're talking about evaluating the relative merit of
| the selected model, it's a question of delta-AIC (or delta-AICc),
| follow the usual rules of thumb -- <2 is approximately equivalent,
| |6 is a lot better, >10 is so good that you can probably discard
| worse models. (See Shane Richards' nice papers on the topic.)
|
| ~ If you have several models within delta-AIC of 10 (or 6) of each
| other, Burnham and Anderson would say you should really be
| averaging model predictions etc. rather than selecting a single
| best model.
|
| ~ If you're talking about a global goodness-of-fit test, then the
| answer's a little bit different. You should do the global GOF
| evaluation on the most-complex model, not a less-complex model
| that was selected for having a better AIC. The standard recipes
| for GOF (checking residual deviance etc.) don't work because the
| negative binomial soaks up any overdispersion -- these recipes
| are geared toward Poisson/binomial data with fixed scale parameters.
| You should do the "usual" graphical diagnostic checking on the
| most complex model (make sure that relationships are linear on
| the scale of the linear predictor, scaled variances are homogeneous,
| distributions within groups follow the expected distribution,
| no gross outliers or points with large leverage, etc etc etc --
| plot(model) will show you a lot of these diagnostics.
| However, there isn't a simple way to get a p value for goodness
| of the fit of the global model in this case. (If this is really
| important, you can pick a summary statistic, calculate it for
| your fitted model, then simulate 'data' from the fitted model many times
| and calculate the summary statistics for the simulated data
| (which represent the null hypothesis that the data really do
| come from the fitted model) and see where your observed
| statistic falls in the distribution.)
|
| ~ cheers
| ~ Ben Bolker
|>
|>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIPvItc5UpGjwzenMRAjkhAJ9j/jJ4geGE63sAgG7CN2nAORlDpwCfctYj
zxBvt6FF4CdRgrx4zQ86WR8=
=ME6E
-----END PGP SIGNATURE-----
More information about the R-sig-ecology
mailing list