[R] Handle lot of variables - Regression

Frank E Harrell Jr f.harrell at vanderbilt.edu
Wed Oct 14 20:13:46 CEST 2009

anna0102 wrote:
> Hey,
> I've got a data set (e.g. named Data) which contains a lot of variables, for
> example: s1, s2, ..., s50
> My first question is:
> It is possible to do this: Data$s1
> But is it also possible to do something like this: Data$s1:s50 (I've tried a
> lot of versions of those without a result)
> My second question:
> I want to do a stepwise logistic regression. For this purpose I use the
> following procedures:
> result<-glm(...)
> step(result, direction="forward)
> Now the problem I have, is, that I have to include all my 50 variables
> (s1-s50), but I don't want to write them all down like y~s1+s2+s3+s4...
> (furthermore it has to be implemented in a loop, so I really need it).
> I've tried do store the 50 variables in a list (e.g. list[[1]]) and tried
> this:
> result<-glm(y ~ list[[1]], ...)
> This works! But if I try to do it stepwise
> result2<-step(result)
> I always get the same results as from glm without a stepwise approach. So
> obviously R can't handle this if you put a list in.
> How can I make this work?
> Thanks in advance,
> Anna


You might as well just take a random sample of your candidate 
predictors.  Stepwise regression isn't much better than that.  Note that 
if you don't have enough events (say 15 times 50) to fit a full model 
then you don't have enough events to do stepwise regression without 
appropriate penalization.


Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list