[R] fitting of all possible models

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Feb 27 17:58:12 CET 2007


Bert Gunter wrote:
> ... Below
> 
> -- Bert 
> 
> Bert Gunter
> Genentech Nonclinical Statistics
> South San Francisco, CA 94404
> 650-467-7374
> 
> 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Frank E Harrell Jr
> Sent: Tuesday, February 27, 2007 5:14 AM
> To: Indermaur Lukas
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] fitting of all possible models
> 
> Indermaur Lukas wrote:
>> Hi,
>> Fitting all possible models (GLM) with 10 predictors will result in loads
> of (2^10 - 1) models. I want to do that in order to get the importance of
> variables (having an unbalanced variable design) by summing the up the
> AIC-weights of models including the same variable, for every variable
> separately. It's time consuming and annoying to define all possible models
> by hand. 
>>  
>> Is there a command, or easy solution to let R define the set of all
> possible models itself? I defined models in the following way to process
> them with a batch job:
>>  
>> # e.g. model 1
>> preference<- formula(Y~Lwd + N + Sex + YY)
> 
>> # e.g. model 2
>> preference_heterogeneity<- formula(Y~Ri + Lwd + N + Sex + YY)  
>> etc.
>> etc.
>>  
>>  
>> I appreciate any hint
>> Cheers
>> Lukas
> 
> If you choose the model from amount 2^10 -1 having best AIC, that model 
> will be badly biased.  Why look at so many?  Pre-specification of 
> models, or fitting full models with penalization, 
> 
> --- ...the rub being how much to penalize. My impression from what I've read
> is, for prediction, close to "the more, the better is the predictor..." .
> Nature rewards parsimony.
> 
> Cheers,
> Bert

Bert,

In my experience nature rewards complexity, if done right.  See Savage's 
antiparsimony principle  -Frank

@Article{gre00whe,
   author =               {Greenland, Sander},
   title =                {When should epidemiologic regressions use 
random coeff
icients?},
   journal =      Biometrics,
   year =                 2000,
   volume =               56,
   pages =                {915-921},
   annote =               {Bayesian methods;causal inference;empirical Bayes
estimators;epidemiologic method;hierarchical regression;mixed
models;multilevel modeling;random-coefficient
regression;shrinkage;variance components;use of statistics in
epidemiology is largely primitive;stepwise variable selection on
confounders leaves important confounders uncontrolled;composition
matrix;example with far too many significant predictors with many
regression coefficients absurdly inflated when
overfit;lack of evidence for dietary effects mediated through
constituents;shrinkage instead of variable selection;larger effect on
confidence interval width than on point estimates with variable
selection;uncertainty about variance of random effects is just
uncertainty about prior opinion;estimation of variance is
pointless;instead the analysis shuld be repeated using different
values;"if one feels compelled to estimate $\tau^2$, I would recommend
giving it a proper prior concentrated amount contextually reasonable
values";claim about ordinary MLE being unbiased is misleading because
it assumes the model is correct and is the only model
entertained;shrinkage towards compositional model;"models need to be
complex to capture uncertainty about the relations...an honest
uncertainty assessment requires parameters for all effects that we
know may be present.  This advice is implicit in an antiparsimony
principle often attributed to L. J. Savage 'All models should be as
big as an elephant (see Draper, 1995)'".  See also gus06per.}
}

> 
> 
> Frank
> 
>>  
>>  
>>  
>>  
>>  
>> °°° 
>> Lukas Indermaur, PhD student 
>> eawag / Swiss Federal Institute of Aquatic Science and Technology 
>> ECO - Department of Aquatic Ecology
>> Überlandstrasse 133
>> CH-8600 Dübendorf
>> Switzerland
>>  
>> Phone: +41 (0) 71 220 38 25
>> Fax    : +41 (0) 44 823 53 15 
>> Email: lukas.indermaur at eawag.ch
>> www.lukasindermaur.ch
>>


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list