[R] Stepwise regression
Marc Schwartz
marc_schwartz at comcast.net
Thu Dec 14 18:02:47 CET 2006
On Thu, 2006-12-14 at 14:37 +0000, Timothy.Mak at iop.kcl.ac.uk wrote:
> Dear all,
>
> I am wondering why the step() procedure in R has the description 'Select a
> formula-based model by AIC'.
>
> I have been using Stata and SPSS and neither package made any reference to
> AIC in its stepwise procedure, and I read from an earlier R-Help post that
> step() is really the 'usual' way for doing stepwise (R Help post from Prof
> Ripley, Fri, 2 Apr 1999 05:06:03 +0100 (BST)).
>
> My understanding of the 'usual' way of doing say forward regression is
> that variables whose p value drops below a criterion (commonly 0.05)
> become candidates for being included in the model, and the one with the
> lowest p among these gets chosen, and the step is repeated until all p
> values not in the model are above 0.05, cf Hosmer and Lemeshow (1989)
> Applied Logistic Regression. The procedure does not require examination of
> the AIC.
>
> I am not well aquainted with R enough to understand the codes used in
> step(), so can somebody tell me how step() works?
>
> Thanks very much,
>
> Tim
> library(fortunes)
> fortune("stepwise")
Frank Harrell: Here is an easy approach that will yield results only
slightly less valid than one actually using the response variable:
x <- data.frame(x1, x2, x3, x4, ..., other potential predictors)
x[ , sample(ncol(x))]
Andy Liaw: Hmm... Shouldn't that be something like:
x[, sample(ncol(x), ceiling(ncol(x) * runif(1)))]
-- Frank Harrell and Andy Liaw (about alternative strategies for
stepwise regression and `random parsimony')
R-help (May 2005)
But seriously, using:
RSiteSearch("stepwise")
will provide links to prior discussions on why the use of stepwise based
model building is to be avoided.
A copy of Frank's book (more info here):
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS
will also provide insight.
HTH,
Marc Schwartz
More information about the R-help
mailing list