[R] Stepwise logistic regression with significance testing - stepAIC

Greg Snow Greg.Snow at imail.org
Tue May 5 18:23:35 CEST 2009


There is not a meaningful alternative way since the way you propose is not meaningful.  The Wald tests have some know problems even in the well defined cases.  Both types of tests are designed to test a predefined hypothesis, not a conditional hypothesis on the stepwise procedure.  It is best to use other approaches than stepwise selection (it has been shown to give biased results) such as the lasso.  If you need to use stepwise, then you should bootstrap the entire selection process to get better estimates/standard errors.  

Frank Harrell's book and package go into more detail on this and provide some tools to help (as well as the other packages that can be used).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Peter-Heinz Fox
> Sent: Tuesday, May 05, 2009 8:02 AM
> To: r-help at r-project.org
> Subject: [R] Stepwise logistic regression with significance testing -
> stepAIC
> 
> Hello R-Users,
> 
> I have one binary dependent variable and a set of independent variables
> (glm(formula,…,family=”binomial”) ) and I am using the function stepAIC
> (“MASS”) for choosing an optimal model. However I am not sure if
> stepAIC considers significance properties like Likelihood ratio test
> and Wald test (see example below).
> 
> > y <- rbinom(30,1,0.4)
> > x1 <- rnorm(30)
> > x2 <- rnorm(30)
> > x3 <- rnorm(30)
> > xdata <- data.frame(x1,x2,x3)
> >
> > fit1 <- glm(y~ . ,family="binomial",data=xdata)
> > stepAIC(fit1,trace=FALSE)
> 
> Call:  glm(formula = y ~ x3, family = "binomial", data = xdata)
> 
> Coefficients:
> (Intercept)           x3
>     -0.3556       0.8404
> 
> Degrees of Freedom: 29 Total (i.e. Null);  28 Residual
> Null Deviance:      40.38
> Residual Deviance: 37.86        AIC: 41.86
> >
> > fit <- glm( stepAIC(fit1,trace=FALSE)$formula  ,family="binomial")
> > my.summ <- summary(fit)
> > # Wald Test
> > print(my.summ$coeff[,4])
> (Intercept)          x3
>   0.3609638   0.1395215
> >
> > my.anova <- anova(fit,test="Chisq")
> > #LR Test
> > print(my.anova$P[2])
> [1] 0.1121783
> >
> 
> Is there an alternative function or a possible way of checking if the
> added variable and the new model are significant within the regression
> steps?
> 
> Thanks in advance for your help
> 
> Regards
> 
> Peter-Heinz Fox
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]



More information about the R-help mailing list