[R] Regsubsets statistics
Thomas Lumley
tlumley at u.washington.edu
Thu Aug 9 14:46:31 CEST 2007
On Wed, 8 Aug 2007, Markus Brugger wrote:
>
> Dear R-help,
>
> I have used the regsubsets function from the leaps package to do subset
> selection of a logistic regression model with 6 independent variables and
> all possible ^2 interactions. As I want to get information about the
> statistics behind the selection output, I´ve intensively searched the
> mailing list to find answers to following questions:
>
> 1. What should I do to get the statistics behind the selection (e.g. BIC)?
> summary.regsubsets(object) just returns "*" meaning "in" or " " meaning out.
> For the plot function generates BICs, it is obviously that these values must
> be computed and available somewhere, but where? Is it possible to directly
> get AIC values instead of BIC?
These statistics are in the object returned by summary(). Using the first example from the help page
> names(summary(a))
[1] "which" "rsq" "rss" "adjr2" "cp" "bic" "outmat" "obj"
> summary(a)$bic
[1] -19.60287 -28.61139 -35.65643 -37.23388 -34.55301
> 2. As to the plot function, I´ve encountered a problem with setting the ylim
> argument. I fear that this (nice!) particular plot function ignores many of
> these additional arguments. How can I nevertheless change this setting?
You can't (without modifying the plot function). The ... argument is required for inheritance [ie, required for R CMD check] but it doesn't take graphical parameters
> 3. For it is not explicitly mentioned in the manual, can I really use
> regsubsets for logistic regression?
>
No. If your data set is large enough relative to the number of variables, you can fit a model with all variables and then apply regsubsets() to the weighted linear model arising from the IWLS algorithm. This will give an approximate ranking of models that you can then refit exactly. This is useful if you wanted to summarize the best few thousand models on 30 variables but not if you want a single model. On the other hand, regsubsets() isn't useful if you want a single model anyway.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list