[R] Handle lot of variables - Regression

Dieter Menne dieter.menne at menne-biomed.de
Wed Oct 14 16:23:04 CEST 2009




anna0102 wrote:
> 
> I've got a data set (e.g. named Data) which contains a lot of variables,
> for example: s1, s2, ..., s50
> 
> My first question is:
> It is possible to do this: Data$s1
> But is it also possible to do something like this: Data$s1:s50 (I've tried
> a lot of versions of those without a 
> result)
> 
> 
Use the [] notation. For example

Data[,c("s1","s2","s3")]

or even better

Data[,grep("s.*",names(a),value=TRUE)]



anna0102 wrote:
> 
> I want to do a stepwise logistic regression. For this purpose I use the
> following procedures:
> result<-glm(...)
> step(result, direction="forward)
> 
> Now the problem I have, is, that I have to include all my 50 variables
> (s1-s50), but I don't want to write them all down like y~s1+s2+s3+s4...
> (furthermore it has to be implemented in a loop, so I really need it).
> 

Construct the formula dynamically. But please, start with only 3 or 4
variables and try if it work. Sometimes deep inside functions things can go
wrong with this method, requiring Ripley's game-like workarounds. See

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/16599.html


a=data.frame(s=1:10,s2=1:10,s4=1:10)
form = paste("z~",grep("s.*",names(a),value=TRUE),collapse="+")
glm(form,....)

And be aware of the nonsense you can (replace by will certainly) get with
stepwise regression and so many parameters. If I were to be treated by a
cure created by stepwise regression, I would prefer voodoo.

Search for "Harrell stepwise" read Frank's well justified soapboxes.

Dieter

-- 
View this message in context: http://www.nabble.com/Handle-lot-of-variables---Regression-tp25889056p25892047.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list