[R] Bestglm subset analysis
Jim Lemon
drjimlemon at gmail.com
Thu Jun 30 03:17:32 CEST 2016
Hi Doug,
To expand a bit on what Bert has written, all the the "best
subset/best model" procedures use random variation in the dataset to
produce a result. This means that you will almost certainly include
variables in your "best model" that cannot be replicated. Sometimes
you can see this as a variable that shouldn't make any difference to
the response variable on the basis of current knowledge is included.
You can often identify such problems with replication. Whenever you
use an automated procedure like this, it's up to you to provide
evidence that the result is not peculiar to the dataset, especially
when there are many measures taken, but on few cases.
Jim
On Thu, Jun 30, 2016 at 4:24 AM, D Wolf via R-help <r-help at r-project.org> wrote:
> Hello All,
> I am working on a linear regression model and trying to find the best subset of variables for my dataset. I have 21 predictors, 1 response variable, and 79 observations. I need to find the best 5 or 6 predictors for my model. I've used leaps for lm() and I'm now trying bestglm for glm(). I'm following this webpage, which gives the code below. https://rstudio-pubs-static.s3.amazonaws.com/2897_9220b21cfc0c43a396ff9abf122bb351.html
> My code:library(bestglm)library(base)lbw.for.bestglm <- within(df_Chl, {y <- df_Chl$Chloro })res.bestglm <- bestglm(Xy = lbw.for.bestglm, family = gaussian, IC = "AIC", method = "exhaustive")
> # get coefficientsres.bestglm$BestModelsHere is a sample of my results (I removed the 5th through 21st predictors for brevity).> res.bestglm$BestModels R21 R31 R32 R41 1 FALSE FALSE FALSE FALSE 2 FALSE TRUE FALSE FALSE 3 FALSE FALSE FALSE FALSE 4 FALSE TRUE FALSE FALSE 5 FALSE TRUE FALSE FALSE Criterion1 326.73272 326.95253 327.06594 327.09125 327.8208
> Is it correct to assume I should keep variables that are TRUE from 1 through 5? What do those five rows represent?
> I know the AIC criterion result should be as low as possible. Is it possible to discern a good result for any of the IC criterion results, such as AIC, LOOCV, BICg, etc..? If BIC returns lower Criterion results, does that mean I need to use the BIC subset instead of the subset from AIC?
> Thank You,
> Doug
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list