[R] Lowest AIC after stepAIC can be lowered by manual reduction of variables (Florian Moser)
Claas Damken
c.damken at auckland.ac.nz
Thu Sep 20 01:08:08 CEST 2012
A few general comments about stepwiseAIC and a suggestion of how to select models
a) Apart form the problem, that stepwise selection is not a garanty to get the best model, you need to have a lot of data to avoid overfitting if your model includes 7 parameter plus interactions (> 10 observations per parameter is what you are ideally looking for).
b) Have a look at Anderson and Burnham's book of 2002 about multi model inference if you want to understand how to proper use AIC.
What I'm doing for my analysis at the moment (count data of two species, host and herbivore):
1) I checked which of my parameters explained the abundance of the species , using GLMs and bootstrapping of an LR-test to check if the model with the parameter is better than one without the parameter ( one way to deal with outliers and extrema)
2) Then I build all combinations of those parameters, that predicted the two species well (p-values <0.05, and >95% sucessfull bootstrapping).
3) I wrote down all the multiple models with decent p-values and calculated AICc ( AICc is for small data sets, and should be used anyway as for very large N AIC almost equals AICc)
(the package glmulti does all the combination models and you can set limits on number of parameters or interactions etc)
4) I manually calculated the weigth based on the AICc of each model with proper performance. This gives you a good idea of which one the best model is and how good that model is compared to all the others models considered. Also, you can calculate weights for each parameter which is very usefull if several models are equally good. I my case, the better models had only one or two parameters, but were ecologically meaningfull and not just the result of data dredging.
Hope this helps,
Cheers
Claas Damken
PhD candidate
School of Environment
The University of Auckland | Te Whare Wananga o Tamaki Makaurau
New Zealand
________________________________________
Von: r-help-bounces at r-project.org [r-help-bounces at r-project.org]" im Auftrag von "r-help-request at r-project.org [r-help-request at r-project.org]
Gesendet: Mittwoch, 19. September 2012 22:00
Bis: r-help at r-project.org
Betreff: R-help Digest, Vol 115, Issue 19
Send R-help mailing list submissions to
r-help at r-project.org
To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/r-help
or, via email, send a message with subject or body 'help' to
r-help-request at r-project.org
You can reach the person managing the list at
r-help-owner at r-project.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of R-help digest..."
Today's Topics:
12. Lowest AIC after stepAIC can be lowered by manual reduction
of variables (Florian Moser)
------------------------------
Message: 12
Date: Tue, 18 Sep 2012 14:27:34 +0100 (BST)
From: Florian Moser <floserx2 at yahoo.de>
To: r-help at r-project.org
Subject: [R] Lowest AIC after stepAIC can be lowered by manual
reduction of variables
Message-ID:
<1347974854.4978.YahooMailClassic at web28904.mail.ir2.yahoo.com>
Content-Type: text/plain
Hello
I am not really a statistic person, so it's possible i did something completely wrong... if this is the case: sorry...
I try to get the best GLM model (with the lowest AIC) for my dataset.
Therefore I run a stepAIC (in the "MASS" package) for my GLM allowing only two-variable-interactions.
For the output (summary) I got a model with 7 (of 8) variabels and 5 interactions and AIC=40.008
BUT: When I take this model and reduce stepwise further variables manually (starting with the one with the highest p-values and first reducing all interactions of a variable before i reduce the variable itself) until i can't reduce more variables since all (or its interaction) have a p-value < 0.1, I get a model with 4 variables and 2 interactions and an AIC of 33.879
So my questions: Why didn't the stepAIC give me the model with AIC=33.879?
And which model should I think of as the best?
For my calculations I used these formulae:
gm1<-glm(cpi~time+tank+...,data=d1)
gm2<-stepAIC(gm1)
summary(gm2)
#to get the "best" model -> AIC=40.008
#afterwards I reduced manually using the formula:
summary(glm(cpi~time+tank+...,data=d1))
giving me a model with AIC=33.879
Hope you understand what I did, and that you can help me.
Thanks
Florian
[[alternative HTML version deleted]]
More information about the R-help
mailing list