[R] glmnet with binary predictors
Nick Sabbe
nick.sabbe at ugent.be
Thu Feb 3 12:03:08 CET 2011
Hello Sambit.
Step1:
Create a matrix out of your predictor data, having columns for every
predictor, coding 1 for yes and 0 for no. he matrix should have a row for
each observation (called pred.mat below)
Besides that, you need a vector with the outcome variable for each
observation (best if this is a factor with 2 levels) (called out.v below)
Step2
Because you are working with categorical variables, don't forget to always
use " standardize = FALSE " in any call to the glmnet functions (see the
docs)
Step3
To see how the predictor coefficients move over different values of your
penalization parameter, simply do something like
myLognet<-glmnet(x=pred.mat, y=out.v, standardize = FALSE,
family="binomial")
and then
plot(myLognet, xvar= "lambda", label = TRUE)
Note: the labels in the plot indicate column numbers in pred.mat
Step4
To find the 'best' value of the penalization parameter, use cv.glmnet with
the same parameters plus a type (see ?cv.glmnet). Note: if the criterion you
want is not provided 'out of the box', it will take you quite a bit of
coding, so if you can, take one of the provided ones.
Visually, you can select the 'best' value for the penalization parameter
from the plot (see ?plot.cv.glmnet), or you can use some numerical argument
to find the reasonable extreme value for the criterion.
Really boilerplate, I guess.
Good luck.
Nick Sabbe
--
ping: nick.sabbe at ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36
-- Do Not Disapprove
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of sambit rath
Sent: donderdag 3 februari 2011 10:58
To: r-help at r-project.org
Subject: [R] glmnet with binary predictors
Hi Everybody!
I must start with a declaration that I am a sparse user of R. I am
creating a credit scorecard using a dataset which has a variable
depicting actual credit history (good/bad) and 41 other variables of
yes/no type. The procedure I am asked to follow is to use a penalized
logistic procedure for variable selection. I have located the package
"glmnet" which gives the complete elasticnet regularization path for
logistic models. I want some help in setting up the process.
Can someone point out the basic steps?
Thanks
Sambit
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list