[R] all possible subsets, with AIC
Frank E Harrell Jr
f.harrell at Vanderbilt.Edu
Mon Feb 15 15:09:06 CET 2010
Nutter, Benjamin wrote:
> I've dabbled in this a little bit, and the result of my dabbling is
> attached. I'll give you fair warning, however. The attached function
> can take a long time to run, and if your model has 10 or more
> predictors, you may be retired before it finishes running.
>
> In any case, it will models for all possible subsets of predictors in
> lm, glm, or coxph. If requested, it will also plot the R-squared,
> Adjusted R-squared, AIC, or BIC of those models (when the values are
> applicable to the model). It might give you a good starting point.
>
> Benjamin
Benjamin,
The statistical properties of this approach are on a par with George
Bush's financial modeling.
One way to see that is to use simulations with lm to compute the bias in
estimating sigma^2.
Frank
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of kcleary2
> Sent: Friday, February 12, 2010 3:19 PM
> To: r-help at r-project.org
> Subject: [R] all possible subsets, with AIC
>
>
>
> Hello,
>
> I have a question about doing ALL possible subsets regression with a
> general linear model. My goal is to produce cumulative Akaike weights
> for each of 7 predictor variables-to obtain this I need R to:
>
> 1.
> Show me ALL possible subsets, not just the best possible subsets
>
> 2. Give
> me an AIC value for each model (instead of a BIC value).
>
> I have tried to
> do this in library(RcmdrPlugin.HH), and using the "leaps" code below.
> With the leaps code my problem is that my response is not a vector, it's
> a single value (density of a species)
>
> ANy help would be greatly
> appreciated. Thanks a lot,
> Kate
>
> ALL-SUBSETS
> REGRESSIOM
>
> DESCRIPTION
>
> leaps() performs an exhaustive search for the best subsets of the
> variables in x for predicting y in linear regression, using an efficient
> branch-and-bound algorithm. It is a compatibility wrapper for regsubsets
> [1] does the same thing better.
>
> Since the algorithm returns a
> best model of each size, the results do not depend on a penalty model
> for model size: it doesn't make any difference whether you want to use
> AIC, BIC, CIC, DIC, ...
>
> USAGE
>
> leaps(x=, y=, wt=rep(1, NROW(x)), int=TRUE, method=c("Cp", "adjr2",
> "r2"), nbest=10, names=NULL, df=NROW(x),
> strictly.compatible=TRUE)
>
> ARGUMENTS
>
> x
> A matrix of predictors
>
> y
>
> A response vector
>
> wt
> Optional weight vector
>
> int
> Add an
> intercept to the model
>
> method
> Calculate Cp, adjusted R-squared or
> R-squared
>
> nbest
> Number of subsets of each size to report
>
> names
>
> vector of names for columns of x
>
> df
> Total degrees of freedom to
> use instead of nrow(x) in calculating Cp and adjusted R-squared
>
>
> strictly.compatible
> Implement misfeatures of leaps() in S
>
> --
> Kate
> Cleary
> MS Candidate
> Department of Fish, Wildlife, and Conservation Biology Colorado State
> University Fort Collins, CO
> 970-491-3535
>
>
>
> Links:
> ------
> [1]
> https://webmail.warnercnr.colostate.edu/leaps/help/regsubsets
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ===================================
>
> P Please consider the environment before printing this e-mail
>
> Cleveland Clinic is ranked one of the top hospitals
> in America by U.S.News & World Report (2009).
> Visit us online at http://www.clevelandclinic.org for
> a complete listing of our services, staff and
> locations.
>
>
> Confidentiality Note: This message is intended for use
> only by the individual or entity to which it is addressed
> and may contain information that is privileged,
> confidential, and exempt from disclosure under applicable
> law. If the reader of this message is not the intended
> recipient or the employee or agent responsible for
> delivering the message to the intended recipient, you are
> hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If
> you have received this communication in error, please
> contact the sender immediately and destroy the material in
> its entirety, whether electronic or hard copy. Thank you.
>
>
--
Frank E Harrell Jr Professor and Chairman School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list