[R] Implementing step-wise linear regression

Bert Gunter gunter.berton at gene.com
Mon Jan 24 18:32:40 CET 2011


FWIW, I think it fair to say that modern statistical practice
generally views stepwise regression as a bad idea, especially in the
hands of non-experts lke yourself. The procedures you describe are
"dangerous": they have an uncomfortably high chance of choosing the
wrong variables and leading to widely overoptimistic assessments of
the predictive value of the variables that are chosen. This leads to
scientifically irreproducible results, otherwise known as nonsense (in
polite company; I use another impolite term when I am not being nice).

Shrinkage in its various manifestations is a much better way to
achieve parsimony. See, e.g. the elasticnet, glmnet, pspline, mgcv,
penalized, ... R packages and the MachineLearning task view on CRAN
for various approaches and implementations. Better yet, consult a
local, knowledgeable statistician to help you with this.

Cheers,
Bert

On Mon, Jan 24, 2011 at 12:03 AM, Tal Galili <tal.galili at gmail.com> wrote:
> Hello Troy.
>
> A tiny question (without answering your question), why did you choose to do
> it this way instead of using
> ?step
> or
> ?stepAIC
>
>
> ?
>
> Best,
> Tal
>
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> ----------------------------------------------------------------------------------------------
>
>
>
>
> On Mon, Jan 24, 2011 at 3:47 AM, Troy S <troysocks-twigs at yahoo.com> wrote:
>
>> Dear R fans,
>>
>> I am trying to do step-wise linear regression using the F-test to decide
>> which variables to admit.  Ewout Steyerberg suggests using the F-test for
>> this purpose.
>>
>> I first build a model using no variables using lm(y ~ 1) and then using one
>> variable that is a strong predictor using lm(y ~ x).  When I call var.test
>> on these two models, I do not get a significant p-value—0.07.  But a
>> summary
>> of the second model gives a F-test p-value that is very small.
>>
>> My questions are:
>>
>> Should I be using var.test to run the F-test to decide which variable to
>> add
>> next?
>>
>> What is the difference between the F-test run by var.test and summary.lm?
>>
>> Has step-wise model building using the F-test been programmed already?
>>
>> Thanks!
>>
>> Troy
>>
>>        [[alternative HTML version deleted]]
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Bert Gunter
Genentech Nonclinical Biostatistics



More information about the R-help mailing list