[R] all possible subsets, with AIC

Frank E Harrell Jr f.harrell at Vanderbilt.Edu
Mon Feb 15 15:09:06 CET 2010


Nutter, Benjamin wrote:
> I've dabbled in this a little bit, and the result of my dabbling is
> attached.  I'll give you fair warning, however.  The attached function
> can take a long time to run, and if your model has 10 or more
> predictors, you may be retired before it finishes running.
> 
> In any case, it will models for all possible subsets of predictors in
> lm, glm, or coxph.  If requested, it will also plot the R-squared,
> Adjusted R-squared, AIC, or BIC of those models (when the values are
> applicable to the model).  It might give you a good starting point.
> 
> Benjamin

Benjamin,

The statistical properties of this approach are on a par with George 
Bush's financial modeling.

One way to see that is to use simulations with lm to compute the bias in 
estimating sigma^2.

Frank

> 
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of kcleary2
> Sent: Friday, February 12, 2010 3:19 PM
> To: r-help at r-project.org
> Subject: [R] all possible subsets, with AIC
> 
> 
> 
> Hello, 
> 
> I have a question about doing ALL possible subsets regression with a
> general linear model. My goal is to produce cumulative Akaike weights
> for each of 7 predictor variables-to obtain this I need R to: 
> 
> 1.
> Show me ALL possible subsets, not just the best possible subsets 
> 
> 2. Give
> me an AIC value for each model (instead of a BIC value). 
> 
> I have tried to
> do this in library(RcmdrPlugin.HH), and using the "leaps" code below.
> With the leaps code my problem is that my response is not a vector, it's
> a single value (density of a species) 
> 
> ANy help would be greatly
> appreciated. Thanks a lot,
> Kate 
> 
> ALL-SUBSETS
> REGRESSIOM
> 
> DESCRIPTION
> 
> leaps() performs an exhaustive search for the best subsets of the
> variables in x for predicting y in linear regression, using an efficient
> branch-and-bound algorithm. It is a compatibility wrapper for regsubsets
> [1] does the same thing better. 
> 
> Since the algorithm returns a
> best model of each size, the results do not depend on a penalty model
> for model size: it doesn't make any difference whether you want to use
> AIC, BIC, CIC, DIC, ... 
> 
> USAGE
> 
> leaps(x=, y=, wt=rep(1, NROW(x)), int=TRUE, method=c("Cp", "adjr2",
> "r2"), nbest=10, names=NULL, df=NROW(x),
> strictly.compatible=TRUE)
> 
> ARGUMENTS
> 
>  		x
>  		A matrix of predictors
> 
>  		y
> 
> 		A response vector
> 
>  		wt
>  		Optional weight vector
> 
>  		int
>  		Add an
> intercept to the model
> 
>  		method
>  		Calculate Cp, adjusted R-squared or
> R-squared
> 
>  		nbest
>  		Number of subsets of each size to report
> 
>  		names
> 
> 		vector of names for columns of x
> 
>  		df
>  		Total degrees of freedom to
> use instead of nrow(x) in calculating Cp and adjusted R-squared
> 
> 
> 		strictly.compatible
>  		Implement misfeatures of leaps() in S
> 
> --
> Kate
> Cleary
> MS Candidate
> Department of Fish, Wildlife, and Conservation Biology Colorado State
> University Fort Collins, CO
> 970-491-3535
> 
> 
> 
> Links:
> ------
> [1]
> https://webmail.warnercnr.colostate.edu/leaps/help/regsubsets
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> ===================================
> 
> P Please consider the environment before printing this e-mail
> 
> Cleveland Clinic is ranked one of the top hospitals
> in America by U.S.News & World Report (2009).  
> Visit us online at http://www.clevelandclinic.org for
> a complete listing of our services, staff and
> locations.
> 
> 
> Confidentiality Note:  This message is intended for use
> only by the individual or entity to which it is addressed
> and may contain information that is privileged,
> confidential, and exempt from disclosure under applicable
> law.  If the reader of this message is not the intended
> recipient or the employee or agent responsible for
> delivering the message to the intended recipient, you are
> hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited.  If
> you have received this communication in error,  please
> contact the sender immediately and destroy the material in
> its entirety, whether electronic or hard copy.  Thank you.
> 
> 

-- 
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list