[R] subset selection for glm

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Oct 15 19:49:50 CEST 2005


On Sat, 15 Oct 2005, Dhiren DSouza wrote:

> I posted a message earlier about subset selection.
>
> I have a data set with 50 variables x1, x2, .... x50
>
> x50 is a binary response variable that I would like to predict.  Is there a
> library I could use to do an exhaustive search for a subset
> (forward/backward subset selection) of variables to include in the
> regression model.  Any help would be greatly appreciated.

?step  (as surely help.search() would have shown you), and btw, that is 
not an `exhaustive search' procedure.

Frank Harrell has posted repeatedly on the dangers of unthinking use of 
such a procedure -- if he does not chime in now, please do look at his 
posts (and if you have access to it, his book).  You have not told us 
*why* you want to do variable selection (which is a more accurate name for 
what you are calling `subset' selection), and for most purposes it is not 
a good idea.


Let me second Roger Bivand's comment earlier today:

> I would, though, appeal to posters to give those who try to reply to
> questions at least a little help, by including an informative signature
> block.

I know that several helpers are quite unlikely to offer help to someone 
sending an unsigned letter, for that is what not using a real user name 
and affiliation amounts to.  So, PLEASE give your credentials -- this 
forum is a free (to the recipients) technical support forum, and that is a 
privilege that should be respected.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list