[R-pkgs] New glmnet package on CRAN
Trevor Hastie
hastie at stanford.edu
Mon Jun 2 20:08:16 CEST 2008
glmnet is a package that fits the regularization path for linear, two-
and multi-class logistic regression
models with "elastic net" regularization (tunable mixture of L1 and L2
penalties).
glmnet uses pathwise coordinate descent, and is very fast.
Some of the features of glmnet:
* by default it computes the path at 100 uniformly spaced (on the log
scale) values of the regularization parameter
* glmnet appears to be faster than any of the packages that are freely
available, in some cases by two orders of magnitude.
* recognizes and exploits sparse input matrices (ala Matrix package).
Coefficient matrices are output in sparse matrix representation.
* penalty is (1-a)*||\beta||_2^2 +a*||beta||_1 where a is between 0 and
1; a=0 is the Lasso penalty, a=1 is the ridge penalty.
For many correlated predictors, a=.95 or thereabouts improves the
performance of the lasso.
* convenient predict, plot, print, and coef methods
* variable-wise penalty modulation allows each variable to be penalized
by a scalable amount; if zero that variable always enters
* glmnet uses a symmetric parametrization for multinomial, with
constraints enforced by the penalization.
Other families such as poisson might appear in later versions of glmnet.
Examples of glmnet speed trials:
Newsgroup data: N=11,000, p=4 Million, two class logistic. 100 values
along lasso path. Time = 2mins
14 Class cancer data: N=144, p=16K, 14 class multinomial, 100 values
along lasso path. Time = 30secs
Authors: Jerome Friedman, Trevor Hastie, Rob Tibshirani.
See our paper http://www-stat.stanford.edu/~hastie/Papers/glmnet.pdf for
implementation details,
and comparisons with other related software.
--
--------------------------------------------------------------------
Trevor Hastie hastie at stanford.edu
Professor & Chair, Department of Statistics, Stanford University
Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977
(650) 498-5233 (Biostatistics) Fax: (650) 725-6951
URL: http://www-stat.stanford.edu/~hastie
address: room 104, Department of Statistics, Sequoia Hall
390 Serra Mall, Stanford University, CA 94305-4065
More information about the R-packages
mailing list