[R] Lowest AIC after stepAIC can be lowered by manual reduction of variables

Greg Snow 538280 at gmail.com
Tue Sep 18 17:47:12 CEST 2012


Do you understand what you did (not the individual steps, but what the
overall process does)?

You simplified your model using things other than the AIC, if you go
back and look at the AIC at each step that you did you will probably
find that some of the intermediate steps actually had a slightly
higher AIC value and that is why the step function stopped where it
did.  It is common that stepwise methods will give different final
models depending on where they are started and what options are used
and that even then they are not guaranteed to give the "best" model,
even when you can determine what "best" means.  Stepwise methods are
often a complicated equivalent to throwing darts blindfolded (the
final model is more due to random chance than anything else).

What question are you trying to answer?  What model makes the most
sense scientifically?

On Tue, Sep 18, 2012 at 7:27 AM, Florian Moser <floserx2 at yahoo.de> wrote:
> Hello
> I am not really a statistic person, so it's possible i did something completely wrong... if this is the case: sorry...
> I try to get the best GLM model (with the lowest AIC) for my dataset.
> Therefore I run a stepAIC (in the "MASS" package) for my GLM allowing only two-variable-interactions.
> For the output (summary) I got a model with 7 (of 8) variabels and 5 interactions and AIC=40.008
> BUT: When I take this model and reduce stepwise further variables manually (starting with the one with the highest p-values and first reducing all interactions of a variable before i reduce the variable itself) until i can't reduce more variables since all (or its interaction) have a p-value < 0.1, I get a model with 4 variables and 2 interactions and an AIC of 33.879
> So my questions: Why didn't the stepAIC give me the model with AIC=33.879?
> And which model should I think of as the best?
>
> For my calculations I used these formulae:
> gm1<-glm(cpi~time+tank+...,data=d1)
> gm2<-stepAIC(gm1)
> summary(gm2)
> #to get the "best" model -> AIC=40.008
> #afterwards I reduced manually using the formula:
> summary(glm(cpi~time+tank+...,data=d1))
> giving me a model with AIC=33.879
>
> Hope you understand what I did, and that you can help me.
> Thanks
> Florian
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com




More information about the R-help mailing list