[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Frank Harrell f.harrell at vanderbilt.edu
Thu Jan 26 18:10:08 CET 2012


To pretend that AIC solves this problem is to ignore that AIC is just a
restatement of P-values.
Frank

Rubén Roa wrote
> 
> I think we have gone through this before.
> First, the destruction of all aspects of statistical inference is not at
> stake, Frank Harrell's post notwithstanding.
> Second, checking all pairs is a way to see for _all pairs_ which model
> outcompetes which in terms of predictive ability by -2AIC or more. Just
> sorting them by the AIC does not give you that if no model is better than
> the next best by less than 2AIC.
> Third, I was not implying that AIC differences play the role of
> significance tests. I agree with you that model selection is better not
> understood as a proxy or as a relative of significance tests procedures.
> Incidentally, when comparing many models the AIC is often inconclusive. If
> one is bent on selecting just _the model_ then I check numerical
> optimization diagnostics such as size of gradients, KKT criteria, and
> other issues such as standard errors of parameter estimates and the
> correlation matrix of parameter estimates. For some reasons I don't find
> model averaging appealing. I guess deep in my heart I expect more from my
> model than just the best predictive ability.
> 
> Rubén
> 
> -----Mensaje original-----
> De: r-help-bounces@ [mailto:r-help-bounces@] En nombre de Ben Bolker
> Enviado el: miércoles, 25 de enero de 2012 15:41
> Para: r-help at .ethz
> Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and
> unique combinations?
> 
> Rubén Roa <rroa <at> azti.es> writes:
> 
>> A more 'manual' way to do it is this.
> 
>> If you have all your models named properly and well organized, say 
>> Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces 
>> a list with one component named 'aic') then something like
>> this:
>  
>> x <- matrix(0,1081,3) x[,1:2] <- t(combn(47,2)) for(i in 1:n){x[,3]
>> <- abs(unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic[1,])-
>> unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic,2)[2,]))}
> 
>  
>> will calculate all the 1081 AIC differences between paired comparisons 
>> of the 47 models. It may not be pretty but works for me.
> 
>  
>> An AIC difference equal or less than 2 is a tie, anything higher is 
>> evidence for the model with the less AIC (Sakamoto et al., Akaike 
>> Information Criterion Statistics, KTK Scientific Publishers, Tokyo).
>  
> 
>   I wouldn't go quite as far as Frank Harrell did in his answer, but
> 
> (1) it seems silly to do all pairwise comparisons of models; one of the
> big advantages of information-theoretic approaches is that you can just
> list the models in order of AIC ...
> 
> (2) the dredge() function from the MuMIn package may be useful for
> automating this sort of thing.  There is an also an AICtab function in the
> emdbook package.
> 
> (3) If you're really just interested in picking the single model with the
> best expected predictive capability (and all of the other assumptions of
> the AIC approach are met -- very large data set, good fit to the data,
> etc.), then just picking the model with the best AIC will work.  It's when
> you start to make inferences on the individual parameters within that
> model, without taking account of the model selection process, that you run
> into trouble.  If you are really concerned about good predictions then it
> may be better to do multi-model averaging *or* use some form of shrinkage
> estimator.
> 
> (4) The "delta-AIC<2 means pretty much the same" rule of thumb is fine,
> but it drives me nuts when people use it as a quasi-significance-testing
> rule, as in "the simpler model is adequate if dAIC<2".  If you're going to
> work in the model selection arena, then don't follow hypothesis-testing
> procedures!  A smaller AIC means lower expected K-L distance, period.
> 
> For the record, Brian Ripley has often expressed the (minority) opinion
> that AIC is *not* appropriate for comparing non-nested models (e.g. 
> <http://tolstoy.newcastle.edu.au/R/help/06/02/21794.html>).
> 
> 
>> 
>> -----Original Message-----
>> From: r-help-bounces <at> r-project.org on behalf of Milan 
>> Bouchet-Valat
>> Sent: Wed 1/25/2012 10:32 AM
>> To: Jhope
>> Cc: r-help <at> r-project.org
> 
>> Subject: Re: [R] How do I compare 47 GLM models with 1 to 5  
>> interactions and unique combinations?
> 
>  
>> Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
>> > Hi R-listers,
>> > 
>> > I have developed 47 GLM models with different combinations of 
>> > interactions from 1 variable to 5 variables. I have manually made 
>> > each model separately and put them into individual tables (organized 
>> > by the number of variables) showing the AIC score. I want to compare
>> all of these models.
>> > 
>> > 1) What is the best way to compare various models with unique 
>> > combinations and different number of variables?
>> See ?step or ?stepAIC (from package MASS) if you want an automated way 
>> of doing this.
>> 
>> > 2) I am trying to develop the most simplest model ideally. Even 
>> > though adding another variable would lower the AIC,
> 
>   No, not necessarily.
> 
>> how do I interpret it is worth
>> > it to include another variable in the model? How do I know when to
>> stop? 
> 
>   When the AIC stops decreasing.
> 
>> This is a general statistical question, not specific to R. As a 
>> general rule, if adding a variable lowers the AIC by a significant 
>> margin, then it's worth including it.
> 
>   If the variable lowers the AIC *at all* then your best estimate is that
> you should include it.
> 
>> You should only stop when a variable increases the AIC. But this is 
>> assuming you consider it a good indicator and you know what you're 
>> doing. There's plenty of literature on this subject.
>> 
>> > Definitions of Variables:
>> > HTL - distance to high tide line (continuous) Veg - distance to 
>> > vegetation Aeventexhumed - Event of exhumation Sector - number 
>> > measurements along the beach Rayos - major sections of beach 
>> > (grouped sectors) TotalEggs - nest egg density
>> > 
>> > Example of how all models were created: 
>> > Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed, 
>> > data=data.to.analyze, family=binomial) Model7.glm <- 
>> > glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = binomial, 
>> > data.to.analyze) Model21.glm <- glm(cbind(Shells, TotalEggs-Shells) 
>> > ~ HTL:Veg:TotalEggs, data.to.analyze, family = binomial) Model37.glm 
>> > <- glm(cbind(Shells, TotalEggs-Shells) ~ 
>> > HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
>> To extract the AICs of all these models, it's easier to put them in a 
>> list and get their AICs like this:
>> m <- list()
>> m$model2 <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed, 
>> data=data.to.analyze, family=binomial)
>> m$model3 <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = 
>> binomial, data.to.analyze)
>> 
>> sapply(m, extractAIC)
>> 
>> Cheers
>>
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331016.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list