[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
Frank Harrell
f.harrell at vanderbilt.edu
Thu Jan 26 18:10:08 CET 2012
To pretend that AIC solves this problem is to ignore that AIC is just a
restatement of P-values.
Frank
Rubén Roa wrote
>
> I think we have gone through this before.
> First, the destruction of all aspects of statistical inference is not at
> stake, Frank Harrell's post notwithstanding.
> Second, checking all pairs is a way to see for _all pairs_ which model
> outcompetes which in terms of predictive ability by -2AIC or more. Just
> sorting them by the AIC does not give you that if no model is better than
> the next best by less than 2AIC.
> Third, I was not implying that AIC differences play the role of
> significance tests. I agree with you that model selection is better not
> understood as a proxy or as a relative of significance tests procedures.
> Incidentally, when comparing many models the AIC is often inconclusive. If
> one is bent on selecting just _the model_ then I check numerical
> optimization diagnostics such as size of gradients, KKT criteria, and
> other issues such as standard errors of parameter estimates and the
> correlation matrix of parameter estimates. For some reasons I don't find
> model averaging appealing. I guess deep in my heart I expect more from my
> model than just the best predictive ability.
>
> Rubén
>
> -----Mensaje original-----
> De: r-help-bounces@ [mailto:r-help-bounces@] En nombre de Ben Bolker
> Enviado el: miércoles, 25 de enero de 2012 15:41
> Para: r-help at .ethz
> Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and
> unique combinations?
>
> Rubén Roa <rroa <at> azti.es> writes:
>
>> A more 'manual' way to do it is this.
>
>> If you have all your models named properly and well organized, say
>> Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces
>> a list with one component named 'aic') then something like
>> this:
>
>> x <- matrix(0,1081,3) x[,1:2] <- t(combn(47,2)) for(i in 1:n){x[,3]
>> <- abs(unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic[1,])-
>> unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic,2)[2,]))}
>
>
>> will calculate all the 1081 AIC differences between paired comparisons
>> of the 47 models. It may not be pretty but works for me.
>
>
>> An AIC difference equal or less than 2 is a tie, anything higher is
>> evidence for the model with the less AIC (Sakamoto et al., Akaike
>> Information Criterion Statistics, KTK Scientific Publishers, Tokyo).
>
>
> I wouldn't go quite as far as Frank Harrell did in his answer, but
>
> (1) it seems silly to do all pairwise comparisons of models; one of the
> big advantages of information-theoretic approaches is that you can just
> list the models in order of AIC ...
>
> (2) the dredge() function from the MuMIn package may be useful for
> automating this sort of thing. There is an also an AICtab function in the
> emdbook package.
>
> (3) If you're really just interested in picking the single model with the
> best expected predictive capability (and all of the other assumptions of
> the AIC approach are met -- very large data set, good fit to the data,
> etc.), then just picking the model with the best AIC will work. It's when
> you start to make inferences on the individual parameters within that
> model, without taking account of the model selection process, that you run
> into trouble. If you are really concerned about good predictions then it
> may be better to do multi-model averaging *or* use some form of shrinkage
> estimator.
>
> (4) The "delta-AIC<2 means pretty much the same" rule of thumb is fine,
> but it drives me nuts when people use it as a quasi-significance-testing
> rule, as in "the simpler model is adequate if dAIC<2". If you're going to
> work in the model selection arena, then don't follow hypothesis-testing
> procedures! A smaller AIC means lower expected K-L distance, period.
>
> For the record, Brian Ripley has often expressed the (minority) opinion
> that AIC is *not* appropriate for comparing non-nested models (e.g.
> <http://tolstoy.newcastle.edu.au/R/help/06/02/21794.html>).
>
>
>>
>> -----Original Message-----
>> From: r-help-bounces <at> r-project.org on behalf of Milan
>> Bouchet-Valat
>> Sent: Wed 1/25/2012 10:32 AM
>> To: Jhope
>> Cc: r-help <at> r-project.org
>
>> Subject: Re: [R] How do I compare 47 GLM models with 1 to 5
>> interactions and unique combinations?
>
>
>> Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
>> > Hi R-listers,
>> >
>> > I have developed 47 GLM models with different combinations of
>> > interactions from 1 variable to 5 variables. I have manually made
>> > each model separately and put them into individual tables (organized
>> > by the number of variables) showing the AIC score. I want to compare
>> all of these models.
>> >
>> > 1) What is the best way to compare various models with unique
>> > combinations and different number of variables?
>> See ?step or ?stepAIC (from package MASS) if you want an automated way
>> of doing this.
>>
>> > 2) I am trying to develop the most simplest model ideally. Even
>> > though adding another variable would lower the AIC,
>
> No, not necessarily.
>
>> how do I interpret it is worth
>> > it to include another variable in the model? How do I know when to
>> stop?
>
> When the AIC stops decreasing.
>
>> This is a general statistical question, not specific to R. As a
>> general rule, if adding a variable lowers the AIC by a significant
>> margin, then it's worth including it.
>
> If the variable lowers the AIC *at all* then your best estimate is that
> you should include it.
>
>> You should only stop when a variable increases the AIC. But this is
>> assuming you consider it a good indicator and you know what you're
>> doing. There's plenty of literature on this subject.
>>
>> > Definitions of Variables:
>> > HTL - distance to high tide line (continuous) Veg - distance to
>> > vegetation Aeventexhumed - Event of exhumation Sector - number
>> > measurements along the beach Rayos - major sections of beach
>> > (grouped sectors) TotalEggs - nest egg density
>> >
>> > Example of how all models were created:
>> > Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
>> > data=data.to.analyze, family=binomial) Model7.glm <-
>> > glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = binomial,
>> > data.to.analyze) Model21.glm <- glm(cbind(Shells, TotalEggs-Shells)
>> > ~ HTL:Veg:TotalEggs, data.to.analyze, family = binomial) Model37.glm
>> > <- glm(cbind(Shells, TotalEggs-Shells) ~
>> > HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
>> To extract the AICs of all these models, it's easier to put them in a
>> list and get their AICs like this:
>> m <- list()
>> m$model2 <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
>> data=data.to.analyze, family=binomial)
>> m$model3 <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
>> binomial, data.to.analyze)
>>
>> sapply(m, extractAIC)
>>
>> Cheers
>>
>
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331016.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list