[R] A question on glmnet analysis
細田弘吉
khosoda at med.kobe-u.ac.jp
Sat Mar 26 14:46:25 CET 2011
Hi Nick,
Thanks a lot for your quick response.
Sorry for delayed response becasue of time difference between us.
(11/03/25 22:40), Nick Sabbe wrote:
> > I haven't read all of your code, but at first read, it seems right.
> >
> > With regard to your questions:
> > 1. Am I doing it correctly or not?
> > Seems OK, as I said. You could use some more standard code to
convert your
> > data to a matrix, but essentially the results should be the same.
> > Also, lambda.min may be a tad to optimistic: to correct for the reuse of
> > data in crossvalidation, one normally uses the minus one se trick (I
think
> > this is described in the helpfile for glmnet.cv, and that is also
present in
> > the glmnet.cv return value (lambda.1se if I'm not mistaken))
Actually,
fit.1cv$lambda.min
[1] 0.036
fit.1$lambda.1se
[1] 0.121
coef(fit.1cv, s=fit.1cv$lambda.1se)
16 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -1.1707519
V1 0.0000000
V2 0.0000000
V3 0.0000000
V4 0.0000000
V5 0.0000000
V6 0.0000000
V7 0.0000000
V8 0.2865651
V9 0.0000000
V10 0.0000000
V11 0.0000000
V12 0.0000000
V13 0.0000000
V14 0.0000000
V15 0.0000000
Do you mean that I should select one parameter model (only V8 included)
described above?
> > 2. Which model, I mean lasso or elastic net, should be selected? and
> > why? Both models chose the same variables but different coefficient
values.
> > You may want to read 'the elements of statistical learning' to find some
> > info on the advantages of ridge/lasso/elnet compared. Lasso should
work fine
> > in this relatively low-dimensional setting, although it depends on the
> > correlation structure of your covariates.
> > Depending on your goals, you may want to refit a standard logistic
> > regression with only the variables selected by the lasso: this
avoids the
> > downward bias that is in (just about) every penalized regression.
fit1.glm <- glm(outcome ~ x1+x2+x3+x4, family="binomial", data=MyData)
summary(fit1.glm)
Call:
glm(formula = outcome ~ x1+x2+x3+x4, family = "binomial", data = MyData)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.806 -0.626 -0.438 -0.191 2.304
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.855 0.393 -4.72 2.3e-06 ***
x1 1.037 0.558 1.86 0.0630 .
x2 1.031 0.326 3.16 0.0016 **
x3 -0.591 0.323 -1.83 0.0678 .
x4 0.347 0.293 1.18 0.2368
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 114.717 on 103 degrees of freedom
Residual deviance: 87.178 on 99 degrees of freedom
AIC: 97.18
Number of Fisher Scoring iterations: 5
library(epicalc)
logistic.display(fit1.glm)
Logistic regression predicting postopDWI_HI2
crude OR(95%CI) adj. OR(95%CI) P(Wald's test)
P(LR-test)
x1 1 vs 0 2.45 (0.98,6.13) 2.82 (0.95,8.42) 0.063
0.059
x2 2.76 (1.6,4.77) 2.8 (1.48,5.31) 0.002 <
0.001
x3 0.64 (0.36,1.14) 0.55 (0.29,1.04) 0.068
0.051
x4 2.07 (1.28,3.34) 1.41 (0.8,2.51) 0.237
0.234
Log-likelihood = -43.5889
No. of observations = 104
AIC value = 97.1777
The AUC of this glm model is 0.81, which means good predictive ability.
My next question, however, is which coefficients value should I select,
glmnet model or glm model for presentation? Is it O.K. to select
variables by glmnet and present coefficients (odds ratio) by glm?
> >
> > 3. Is it O.K. to calculate odds ratio by exp(coefficients)? And how can
> > you calculate 95% confidence interval of odds ratio?
> > Or 95%CI is meaningless in this kind of analysis?
> > At this time, confidence intervals for lasso/elnet in GLM settings is an
> > open problem (the reason being that the L1 penalty is not
differentiable).
> > Some 'solutions' exist (bootstrap, for one), but they have all been
shown to
> > have (statistical) properties that make them - at the least -
doubtful. I
> > know, because I'm working on this. Short answer: there is no way to
do this
> > (at this time).
Well, thank you for a clear-cut answer. :-)
> >
> > HTH (and hang on there in Japan),
> > Nick Sabbe
Your answers helped me very much.
Actually, I live in the west part of Japan and fortunately had no damage
around neighborhodd by the earthquake. Now we are worried about the
nuclear power plant...
Best regards,
KH
More information about the R-help
mailing list