[R] A question on glmnet analysis

細田弘吉 khosoda at med.kobe-u.ac.jp
Fri Mar 25 14:04:22 CET 2011


Hi,
I am trying to do logistic regression for data of 104 patients, which
have one outcome (yes or no) and 15 variables (9 categorical factors
[yes or no] and 6 continuous variables). Number of yes outcome is 25.
Twenty-five events and 15 variables mean events per variable is much
less than 10. Therefore, I tried to analyze the data with penalized
regression method. I would like please some of the experts here to help me.

First of all, I standardized all 6 continuous variables by scale() with
center=TRUE and scale=TRUE option. Nine categorical variables and one
outcome variable were re-coded as 0 or 1. Then, I used glmnet with
standardize=FALSE option because of presence of categorical variables.

x15std <- matrix(c(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15),
104, 15)
y <- outcome
library(glmnet)
fit.1 <- glmnet(x15std, y, family="binomial", standardize=FALSE)
fit.1cv <- cv.glmnet(x15std, y, family="binomial", standardize=FALSE)

default alpha=1, so this should be lasso penalty.

Coefficients.fit1 <- coef(fit1, s=fit1.cv$lambda.min)
Active.Index.fit1 <- which(Coefficients.fit1 !=0)
Active.Coefficients.fit1 <- Coefficients.fit1[Active.Index.fit1]
Active.Index.fit1
[1]  1  5  9 10 16
Active.Coefficients.fit1
[1] -1.28774827  0.01420395  0.70444865 -0.27726625  0.18455926

My optimal model chose 5 active covariates including intercept as first one.

Second, I did the same things with alpha=0.5 option to do elastic net
analysis.

fit.2 <- glmnet(x15std, y, family="binomial", standardize=FALSE, alpha=0.5)
fit.2cv <- cv.glmnet(x15std, y, family="binomial", standardize=FALSE,
alpha=0.5)
Coefficients.fit2 <- coef(fit2, s=fit2.cv$lambda.min)
Active.Index.fit2 <- which(Coefficients.fit2 !=0)
Active.Coefficients.fit2 <- Coefficients.fit2[Active.Index.fit2]
Active.Index.fit2
[1]  1  5  9 10 16
Active.Coefficients.fit2
[1] -1.3286190  0.1410739  0.6315108 -0.2668022  0.2292459

This model chose the same 5 active covariates as first one with lasso
penalty.

My questions are followings;
1. Am I doing it correctly or not?
2. Which model, I mean lasso or elastic net, should be selected? and
why? Both models chose the same variables but different coefficient values.
3. Is it O.K. to calculate odds ratio by exp(coefficients)? And how can
you calculate 95% confidence interval of odds ratio?
Or 95%CI is meaningless in this kind of analysis?

I would appreciate your help in advance.
KH



More information about the R-help mailing list