[R] Logistic Regression - Variable Selection Methods With Prediction

RAJ dheerajathreya at gmail.com
Wed Oct 26 01:54:17 CEST 2011


Hello,

I am pretty new to R, I have always used SAS and SAS products. My
target variable is binary ('Y' and 'N') and i have about 14 predictor
variables. My goal is to compare different variable selection methods
like Forward, Backward, All possible subsests. I am using
misclassification rate to pick the winner method.

This is what i have as of now,

Reg <- glm (Graduation ~., DFtrain,family=binomial(link="logit"))
		step <- extractAIC(Reg, direction="forward")
		pred <- predict(Reg, DFtest,type="response")
		mis <- mean({pred > 0.5} != {DFtest[,"Graduation"] == "Y"})
This program actually works but I needed to check to make sure am
doing this right. Also, I am getting the same misclassification rates
for all different methods.

I also tried to use

Reg <- leaps(Graduation ~., DFtrain)
		pred <- predict(Reg, DFtest,type="response")
		mis <- mean({pred > 0.5} != {DFtest[,"Graduation"] == "Y"})
		#print(summary(mis))
which doesnt work

and

Reg <- regsubsets(Graduation ~., DFtrain)
		pred <- predict(Reg, DFtest,type="response")
		mis <- mean({pred > 0.5} != {DFtest[,"Graduation"] == "Y"})
		#print(summary(mis))

The Regsubsets will work but the 'predict' function does not work with
it. Is there any other way to do predictions when using regsubsets

Any help is appreciated.

Thanks,



More information about the R-help mailing list