[R] Handle lot of variables - Regression
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Wed Oct 14 20:13:46 CEST 2009
anna0102 wrote:
> Hey,
>
> I've got a data set (e.g. named Data) which contains a lot of variables, for
> example: s1, s2, ..., s50
>
> My first question is:
> It is possible to do this: Data$s1
> But is it also possible to do something like this: Data$s1:s50 (I've tried a
> lot of versions of those without a result)
>
> My second question:
> I want to do a stepwise logistic regression. For this purpose I use the
> following procedures:
> result<-glm(...)
> step(result, direction="forward)
>
> Now the problem I have, is, that I have to include all my 50 variables
> (s1-s50), but I don't want to write them all down like y~s1+s2+s3+s4...
> (furthermore it has to be implemented in a loop, so I really need it).
> I've tried do store the 50 variables in a list (e.g. list[[1]]) and tried
> this:
> result<-glm(y ~ list[[1]], ...)
> This works! But if I try to do it stepwise
> result2<-step(result)
> I always get the same results as from glm without a stepwise approach. So
> obviously R can't handle this if you put a list in.
> How can I make this work?
>
> Thanks in advance,
> Anna
>
Anna,
You might as well just take a random sample of your candidate
predictors. Stepwise regression isn't much better than that. Note that
if you don't have enough events (say 15 times 50) to fit a full model
then you don't have enough events to do stepwise regression without
appropriate penalization.
Frank
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list