[R] Bestglm subset analysis
D Wolf
doug45290 at yahoo.com
Wed Jun 29 20:24:50 CEST 2016
Hello All,
I am working on a linear regression model and trying to find the best subset of variables for my dataset. I have 21 predictors, 1 response variable, and 79 observations. I need to find the best 5 or 6 predictors for my model. I've used leaps for lm() and I'm now trying bestglm for glm(). I'm following this webpage, which gives the code below. https://rstudio-pubs-static.s3.amazonaws.com/2897_9220b21cfc0c43a396ff9abf122bb351.html
My code:library(bestglm)library(base)lbw.for.bestglm <- within(df_Chl, {y <- df_Chl$Chloro })res.bestglm <- bestglm(Xy = lbw.for.bestglm, family = gaussian, IC = "AIC", method = "exhaustive")
# get coefficientsres.bestglm$BestModelsHere is a sample of my results (I removed the 5th through 21st predictors for brevity).> res.bestglm$BestModels R21 R31 R32 R41 1 FALSE FALSE FALSE FALSE 2 FALSE TRUE FALSE FALSE 3 FALSE FALSE FALSE FALSE 4 FALSE TRUE FALSE FALSE 5 FALSE TRUE FALSE FALSE Criterion1 326.73272 326.95253 327.06594 327.09125 327.8208
Is it correct to assume I should keep variables that are TRUE from 1 through 5? What do those five rows represent?
I know the AIC criterion result should be as low as possible. Is it possible to discern a good result for any of the IC criterion results, such as AIC, LOOCV, BICg, etc..? If BIC returns lower Criterion results, does that mean I need to use the BIC subset instead of the subset from AIC?
Thank You,
Doug
[[alternative HTML version deleted]]
More information about the R-help
mailing list