[R] Insignificant variable improves AIC (multinom)?
Ravi Varadhan
rvaradhan at jhmi.edu
Sat Jun 13 19:37:07 CEST 2009
Hi Werner,
AICs of nested models are compared on additive scale, not on multiplicative scale. So, you have to think about how much the AIC is decreased when you add the new variable, not the factor by which it is reduced.
If you are doing a stepwise selection based on AIC, then the p-value approach and AIC approach are related. In the AIC approach, you include a new variable or delete an existing variable when the change in AIC score is 2 or more. In the stepwise likelihood ratio test, LRT, (a.k.a. F-test in linear regression), to select variables, the AIC score change of 2 corresponds roughly to a p-value of 0.15, i.e. entering or deleting a variable if the p-value for the LRT is less than 0.15.
Of course, the big issue is that the sampling properties of stepwise model selection procedures are extremely difficult to characterize. Resampling and cross-validation approaches can help address this problem. Another more principled approach to model selection is to use regularization methods (e.g. ridge, lasso). But there is no free lunch. In regularization methods, one has to decide on the degree of regularization.
I hope I have successfully convinced you about the perils and pitfalls of model selection.
Best,
Ravi.
____________________________________________________________________
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University
Ph. (410) 502-2619
email: rvaradhan at jhmi.edu
----- Original Message -----
From: Werner Wernersen <pensterfuzzer at yahoo.de>
Date: Saturday, June 13, 2009 10:52 am
Subject: Re: [R] Insignificant variable improves AIC (multinom)?
To: Peter Flom <peterflomconsulting at mindspring.com>, r-help at stat.math.ethz.ch
> > >Hi,
>
> > >
> > >I am trying to specify a multinomial logit model using the
> multinom function
> > from the nnet package. Now I add another independent variable and
> it halves the
> > AIC as given by summary(multinom()). But when I call
> Anova(multinom()) from the
> > car package, it tells me that this added variable is insignificant
>
> > (Pr(>Chisq)=0.39). Thus, the improved AIC suggests to keep the
> variable but the
> > Anova suggests to drop it.
> > >
> > >I am sure this is due to my lack of understanding of these models
> but could
> > someone help me out with a pointer what my mistake is?
> >
> >
> > I am not sure why you would expect the same answer from AIC and
> p-value. They
> > are different questions. AIC attempts to answer a question about
> overall model
> > fit. p-value for a particular variable attempts to answer whether
> that
> > particular coefficient could be due to chance if the population
> value of the
> > parameter was 0.
> >
> > One way these could give different answers is if the new variable
> affected the
> > parameter estimates for the other parameters.
> >
> > It's yet another exemplar of the problems with using p-values for
> model
> > selection
> >
> > HTH
> >
> > Peter
> >
> > Peter L. Flom, PhD
> > Statistical Consultant
> > www DOT peterflomconsulting DOT com
>
[[elided Yahoo spam]]
>
> That was very enlightening. I have to read up on model selection. The
> thought I have to get my head around is that the added variable helps
> explaining the observed variability in the data and thus should be
> retained in the model. But since the coefficient is insignificant, I
> cannot interpret it and if I use this equation for predictions then I
> add a "random" value since I cannot reject that the coefficient is
> actually zero instead of what I estimated.
>
> One just never sees someone presenting regression coefficients which
> are not significant although model selection procedures are often
> based on the AIC...
>
> Have a good weekend,
> Werner
>
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
>
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list