[R] A question on glmnet analysis

Nick Sabbe nick.sabbe at ugent.be
Fri Mar 25 14:40:29 CET 2011


I haven't read all of your code, but at first read, it seems right.

With regard to your questions:
1. Am I doing it correctly or not?
Seems OK, as I said. You could use some more standard code to convert your
data to a matrix, but essentially the results should be the same.
Also, lambda.min may be a tad to optimistic: to correct for the reuse of
data in crossvalidation, one normally uses the minus one se trick (I think
this is described in the helpfile for glmnet.cv, and that is also present in
the glmnet.cv return value (lambda.1se if I'm not mistaken))

2. Which model, I mean lasso or elastic net, should be selected? and
why? Both models chose the same variables but different coefficient values.
You may want to read 'the elements of statistical learning' to find some
info on the advantages of ridge/lasso/elnet compared. Lasso should work fine
in this relatively low-dimensional setting, although it depends on the
correlation structure of your covariates.
Depending on your goals, you may want to refit a standard logistic
regression with only the variables selected by the lasso: this avoids the
downward bias that is in (just about) every penalized regression.

3. Is it O.K. to calculate odds ratio by exp(coefficients)? And how can
you calculate 95% confidence interval of odds ratio?
Or 95%CI is meaningless in this kind of analysis?
At this time, confidence intervals for lasso/elnet in GLM settings is an
open problem (the reason being that the L1 penalty is not differentiable).
Some 'solutions' exist (bootstrap, for one), but they have all been shown to
have (statistical) properties that make them - at the least - doubtful. I
know, because I'm working on this. Short answer: there is no way to do this
(at this time).

HTH (and hang on there in Japan),


Nick Sabbe
--
ping: nick.sabbe at ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove



-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of ????
Sent: vrijdag 25 maart 2011 14:04
To: r-help at stat.math.ethz.ch
Subject: [R] A question on glmnet analysis

Hi,
I am trying to do logistic regression for data of 104 patients, which
have one outcome (yes or no) and 15 variables (9 categorical factors
[yes or no] and 6 continuous variables). Number of yes outcome is 25.
Twenty-five events and 15 variables mean events per variable is much
less than 10. Therefore, I tried to analyze the data with penalized
regression method. I would like please some of the experts here to help me.

First of all, I standardized all 6 continuous variables by scale() with
center=TRUE and scale=TRUE option. Nine categorical variables and one
outcome variable were re-coded as 0 or 1. Then, I used glmnet with
standardize=FALSE option because of presence of categorical variables.

x15std <- matrix(c(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15),
104, 15)
y <- outcome
library(glmnet)
fit.1 <- glmnet(x15std, y, family="binomial", standardize=FALSE)
fit.1cv <- cv.glmnet(x15std, y, family="binomial", standardize=FALSE)

default alpha=1, so this should be lasso penalty.

Coefficients.fit1 <- coef(fit1, s=fit1.cv$lambda.min)
Active.Index.fit1 <- which(Coefficients.fit1 !=0)
Active.Coefficients.fit1 <- Coefficients.fit1[Active.Index.fit1]
Active.Index.fit1
[1]  1  5  9 10 16
Active.Coefficients.fit1
[1] -1.28774827  0.01420395  0.70444865 -0.27726625  0.18455926

My optimal model chose 5 active covariates including intercept as first one.

Second, I did the same things with alpha=0.5 option to do elastic net
analysis.

fit.2 <- glmnet(x15std, y, family="binomial", standardize=FALSE, alpha=0.5)
fit.2cv <- cv.glmnet(x15std, y, family="binomial", standardize=FALSE,
alpha=0.5)
Coefficients.fit2 <- coef(fit2, s=fit2.cv$lambda.min)
Active.Index.fit2 <- which(Coefficients.fit2 !=0)
Active.Coefficients.fit2 <- Coefficients.fit2[Active.Index.fit2]
Active.Index.fit2
[1]  1  5  9 10 16
Active.Coefficients.fit2
[1] -1.3286190  0.1410739  0.6315108 -0.2668022  0.2292459

This model chose the same 5 active covariates as first one with lasso
penalty.

My questions are followings;
1. Am I doing it correctly or not?
2. Which model, I mean lasso or elastic net, should be selected? and
why? Both models chose the same variables but different coefficient values.
3. Is it O.K. to calculate odds ratio by exp(coefficients)? And how can
you calculate 95% confidence interval of odds ratio?
Or 95%CI is meaningless in this kind of analysis?

I would appreciate your help in advance.
KH

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list