[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Frank Harrell f.harrell at vanderbilt.edu
Fri Jan 27 19:41:24 CET 2012


Ruben you are mistaken on every single point.  But I see it's not worth
continuing this discussion.
Frank

Rubén Roa wrote
> 
> -----Mensaje original-----
> De: r-help-bounces@ [mailto:r-help-bounces@] En nombre de Frank Harrell
> Enviado el: viernes, 27 de enero de 2012 14:28
> Para: r-help@
> Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions
> and unique combinations?
> 
> Ruben, I'm not sure you are understanding the ramifications of what Bert
> said.  In addition you are making several assumptions implicitly:
> 
> --
> Ruben: Frank, I guess we are going nowhere now.
> But thanks anyways. See below if you want.
> 
> 1. model selection is needed (vs. fitting the full model and using
> shrinkage)
> Ruben: Nonlinear mechanistic models that are too complex often just don't
> converge, they crash. No shrinkage to apply to a failed convergence model.
> 
> 2. model selection works in the absence of shrinkage 
> Ruben: I think you are assuming that shrinkage is necessary.
> 
> 3. model selection can find the "right" model and the features selected
> would be the same features selected if the data were slightly perturbed or
> a new sample taken 
> Ruben: I don't make this assumption. New data, new model.
> 
> 4. AIC tells you something that P-values don't (unless using structured
> multiple degree of freedom tests)
> Ruben: It does.
> 
>  5. parsimony is good
> Ruben: It is.
> 
> None of these assumptions is true.  Model selection without shrinkage
> (penalization) seems to offer benefits but this is largely a mirage.
> 
> Ruben: Have a good weekend!
> 
> Ruben
> 
> Rubén Roa wrote
>> 
>> -----Mensaje original-----
>> De: Bert Gunter [mailto:gunter.berton@] Enviado el: jueves, 26 de 
>> enero de 2012 21:20
>> Para: Rubén Roa
>> CC: Ben Bolker; Frank Harrell
>> Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 
>> interactions and unique combinations?
>> 
>> On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa <rroa@> wrote:
>>> I think we have gone through this before.
>>> First, the destruction of all aspects of statistical inference is not 
>>> at stake, Frank Harrell's post notwithstanding.
>>> Second, checking all pairs is a way to see for _all pairs_ which 
>>> model outcompetes which in terms of predictive ability by -2AIC or 
>>> more. Just sorting them by the AIC does not give you that if no model 
>>> is better than the next best by less than 2AIC.
>>> Third, I was not implying that AIC differences play the role of 
>>> significance tests. I agree with you that model selection is better 
>>> not understood as a proxy or as a relative of significance tests
>>> procedures.
>>> Incidentally, when comparing many models the AIC is often inconclusive.
>>> If one is bent on selecting just _the model_ then I check numerical 
>>> optimization diagnostics such as size of gradients, KKT criteria, and 
>>> other issues such as standard errors of parameter estimates and the 
>>> correlation matrix of parameter estimates.
>> 
>> -- And the mathematical basis for this claim is ....  ??
>> 
>> --
>> Ruben: In my area of work (building/testing/applying mechanistic 
>> nonlinear models of natural and economic phenomena) the issue of 
>> numerical optimization is a very serious one. It is often the case 
>> that a really good looking model does not converge properly (that's 
>> why ADMB is so popular among us). So if the AIC is inconclusive, but 
>> one AIC-tied model yields reasonably looking standard errors and low 
>> pairwise parameter estimates correlations, while the other wasn´t even 
>> able to produce a positive definite Hessian matrix (though it was able 
>> to maximize the log-likelihood), I think I have good reasons to select 
>> the properly converged model. I assume here that the lack of 
>> convergence is a symptom of model inadequacy to the data, that the AIC
>> was not able to detect.
>> ---
>> Ruben: For some reasons I don't find model averaging appealing. I 
>> guess deep in my heart I expect more from my model than just the best 
>> predictive ability.
>> 
>> -- This is a religious, not a scientific statement, and has no place 
>> in any scientific discussion.
>> 
>> --
>> Ruben: Seriously, there is a wide and open place in scientific 
>> discussion for mechanistic model-building. I expect the models I built 
>> to be more than able predictors, I want them to capture some aspect of 
>> nature, to teach me something about nature, so I refrain from model 
>> averaging, which is an open admission that you don't care too much 
>> about what's really going on.
>> 
>> -- The belief that certain data analysis practices -- standard or not 
>> -- somehow can be trusted to yield reliable scientific results in the 
>> face of clear theoretical (mathematical )and practical results to the 
>> contrary, while widespread, impedes and often thwarts the progress of 
>> science, Evidence-based medicine and clinical trials came about for a 
>> reason. I would encourage you to reexamine the basis of your 
>> scientific practice and the role that "magical thinking" plays in it.
>> 
>> Best,
>> Bert
>> 
>> --
>> Ruben: All right Bert. I often re-examine the basis of my scientific 
>> praxis but less often than I did before, I have to confess. I like to 
>> think it is because I am converging on the right praxis so there are 
>> less issues to re-examine. But this problem of model selection is a tough
>> one.
>> Being a likelihoodist in inference naturally leads you to AIC-based 
>> model selection, Jim Lindsey being influent too. Wanting that your 
>> models say some something about nature is another strong conditioning 
>> factor. Put this two together and it becomes hard to do 
>> model-averaging. And it has nothing to do with religion (yuck!).
>> 
>> ______________________________________________
>> R-help@ mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> --
> View this message in context:
> http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4333464.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4334353.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list