[Rsigeco] glmmodel evaluation
Ben Bolker
bolker at ufl.edu
Thu May 29 20:13:01 CEST 2008
BEGIN PGP SIGNED MESSAGE
Hash: SHA1
Manuel Spínola wrote:
 Thank you very much Ben.
 Yes, my question is after the model selection procedure (after using
 AIC). It is my understanding that modeling it doesn't finish on finding
 the best model using IT methods, you should also see if the selected
 model is a good model, right?
~ Yes, but ... any formal goodnessoffit examination should
really be done on the full (most complex) model, before trying
to select a model. The idea is that, if the most complex model
is a reasonable fit, then any simpler models that are selected
will fit adequately (because if they didn't fit adequately,
*relative to the most complex model*, they wouldn't be selected ...)
~ That said, there's no reason not to look at the selected
model for adequacy as well  just that it would be very surprising
if the full model were adequate and the reduced one weren't.

 In you message you wrote:

 "make sure that relationships are linear on the scale of the linear
 predictor, scaled variances are homogeneous"

 what do you mean "on the scale of the linear predictor"? How can I do
 this in R? What if one of my variable is categorical?
~ This comment only applies to continuous predictors. "On the scale of
the linear predictor" means on the logged scale (in the case of a
negative binomial model, which usually has a log link).

 Also, what do you mean that "scaled variance are homogeneous"? Is that
 an assumption for glm?
~ GLM assumes that there is a single negative binomial k parameter
that covers all groups. So the variance scaled by the expected variance
for a given group (given this same neg binomial k) should be about
the same in all groups.

 Is there any other consideration when using negative binomial models. I
 decided for this type of model because I have overdispersion in the
 Poisson models.
~ Good choice (although Burnham and Anderson 2002 say that
quasilikelihood approaches usually work just fine, I prefer
NB models where they're feasible).
 Thank you again Ben.
 Best,

 Manuel
~ Here's a little simulation to play with ... try different
randomnumber seeds to see how big the deviations are when
the model is correct.
set.seed(1002)
f < factor(rep(c("a","b"),each=40))
x < runif(80)
b < c(1,3)
eta < b[f]+2*x
y < rnbinom(80,mu=exp(eta),size=0.5)
library(lattice)
xyplot(jitter(y+0.1,0.1)~xf,scales=list(y=list(log=TRUE)))
library(MASS)
m = glm.nb(y~x+f)
plot(m)


 Ben Bolker escribió:
 Manuel Spínola wrote:
  Dear list members,
 
  I am fitting negative binomial models with the nb.glm function (MASS
  package).
  I ran several models and did model selection using AIC.
  How is a good way to evaluate how good is the selected model (lower AIC
  and considerable Akaike weight)?
  Is model diagnostics a good approach?
  Thank you very much in advance.
 
  Best,
 
  Manuel Spínola
 

 ~ Manuel,

 ~ not absolutely sure what your question is.

 ~ If you're talking about evaluating the relative merit of
 the selected model, it's a question of deltaAIC (or deltaAICc),
 follow the usual rules of thumb  <2 is approximately equivalent,
 6 is a lot better, >10 is so good that you can probably discard
 worse models. (See Shane Richards' nice papers on the topic.)

 ~ If you have several models within deltaAIC of 10 (or 6) of each
 other, Burnham and Anderson would say you should really be
 averaging model predictions etc. rather than selecting a single
 best model.

 ~ If you're talking about a global goodnessoffit test, then the
 answer's a little bit different. You should do the global GOF
 evaluation on the mostcomplex model, not a lesscomplex model
 that was selected for having a better AIC. The standard recipes
 for GOF (checking residual deviance etc.) don't work because the
 negative binomial soaks up any overdispersion  these recipes
 are geared toward Poisson/binomial data with fixed scale parameters.
 You should do the "usual" graphical diagnostic checking on the
 most complex model (make sure that relationships are linear on
 the scale of the linear predictor, scaled variances are homogeneous,
 distributions within groups follow the expected distribution,
 no gross outliers or points with large leverage, etc etc etc 
 plot(model) will show you a lot of these diagnostics.
 However, there isn't a simple way to get a p value for goodness
 of the fit of the global model in this case. (If this is really
 important, you can pick a summary statistic, calculate it for
 your fitted model, then simulate 'data' from the fitted model many times
 and calculate the summary statistics for the simulated data
 (which represent the null hypothesis that the data really do
 come from the fitted model) and see where your observed
 statistic falls in the distribution.)

 ~ cheers
 ~ Ben Bolker
>
>
BEGIN PGP SIGNATURE
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla  http://enigmail.mozdev.org
iD8DBQFIPvItc5UpGjwzenMRAjkhAJ9j/jJ4geGE63sAgG7CN2nAORlDpwCfctYj
zxBvt6FF4CdRgrx4zQ86WR8=
=ME6E
END PGP SIGNATURE
More information about the Rsigecology
mailing list