[R] Stepwise regression scope: all interacting terms (.^2)

Mon Nov 19 15:39:42 CET 2012

David, thanks for the feedback!

Steve, thanks for the direction! I have heard and read some about Dr. Harrell's work but somehow had missed the term "penalized logistic regression." That was helpful for finding more specific sources to follow Dr. Harrell's (and other's) suggestions. I may have more questions in the near future.

On Nov 16, 2012, at 3:32 PM, Steve Lianoglou wrote:

> Hi Mark,
> 
> To put some context to David's response below, you can search the list
> archives for times when people ask about stepwise regression. You can
> get started here:
> 
> http://search.gmane.org/search.php?group=gmane.comp.lang.r.general&query=stepwise+penalized
> 
> The long and short of it is that you are almost always encouraged to
> use some regularization/penalized model instead of this stepwise
> approach. Frank Harrell, in particular, is generally quite vocal
> against stepwise regression -- I'm actually surprised he hasn't chimed
> in by now, but maybe he's getting a bit tired of fighting the good
> fight -- or, it's close to the holiday and he's taking a break ;-)
> 
> Anyway ... HTH,
> 
> -steve
> 
> On Fri, Nov 16, 2012 at 4:13 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>> 
>> On Nov 16, 2012, at 12:16 PM, Mark Ebbert wrote:
>> 
>>> I haven't heard anything on this question. Is there something fundamentally wrong with my question? Any feedback is appreciated.
>>> 
>> 
>> Perhaps failure to read this sig at the bottom of every posted message to rhelp?
>> 
>> "PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code."
>> 
>> 
>>> Mark
>>> On Nov 15, 2012, at 8:13 AM, Mark T. W. Ebbert wrote:
>>> 
>>>> Dear Gurus,
>>>> 
>>>> Thank you in advance for your assistance. I'm trying to understand scope better when performing stepwise regression using "step."
>> 
>> From the help page of step:
>> "If scope is a single formula, it specifies the upper component, and the lower model is empty. "
>> 
>>>> I have a model with a binary response variable and 10 predictor variables. When I perform stepwise regression I define scope=.^2 to allow interactions between all terms.
>> 
>> I generally avoid answering questions about stepwise regression, because most of them do not include sufficient background material to justify that strategy. Yours certainly did not.
>> 
>> 
>>>> But I am missing something. When I perform stepwise regression (both directions) on the main model (y~x1+x2+…+x10) the method returns quickly with an answer; however, when I define all interactions in the main model (y~x1+x2+…+x10+x1:x2+x1:x3+…) and then perform stepwise regression (backward only) it runs so long I have to kill it.
>>>> 
>>>> So here's my question: what is the difference between scope=.^2 on the additive (proper term?) model and defining all interactions and doing backward regression? My understanding is that .^2 is supposed to allow all interactions!
>> 
>> Well, I would have guessed all two-way interactions (all 45  of them in your case) would be included and then successively reduce until you got to your specified (arbitrary and most likely incorrectly set) endpoint.) I think the help page Details section is unclear on this point. I do not think that the 120 potential three-way interactions are part of the scope in that instance, but it should be easy enough for you to test that possibility.
>> 
>> --
>> David Winsemius, MD
>> Alameda, CA, USA
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact