[R] glmnet estimates and Tibshirani JRSSB 1996

Mon Nov 17 11:49:29 CET 2014

Dear all,

I have a question that hopefully is an R question and does not simply arise from my lack of understanding of the LASSO.

The code below generates two different sets of relationships between y and X, one in which both variables matter (coefficients .5 each, line 14) and one in which one of the two will be shrunk to zero (coefficients .9 and .01, line 13). Lines 15-18 normalize by demeaning y and creating an orthonormal X matrix. I assign a "budget" of 0.5 to the coefficients (line 21). Line 31 translates this budget to the implied lambda in the Lagrangian form of the Lasso (one potential source of error, but I hope I read JRSSB equation 6 correctly here).

In the scenario in which both variables matter, everything works fine: the sum of the coefficients is 0.5 as intended, and the expressions from Tibshirani, eq 6 (lines 24,25), and those from glmnet agree. If, however, line 13 is switched on, so that the second coefficient gets shrunk to 0, the sum of the coefficients no longer equals the budget of 0.5 for either expression (which still agree). Any thoughts on why I only seem to do it right in the case in which there is no shrinkage to 0?

Best,
Christoph

rm(list=ls())
library(glmnet)
library(mvtnorm)
set.seed(38)

N = 50
K = 2
SigCorr=.0
Sigma = matrix(c(1,SigCorr,SigCorr,1),ncol=2)
X = rmvnorm(N, mean=rnorm(K), sigma=Sigma, method="chol")

u = rnorm(N)
y = .9*X[,1]+.01*X[,2]+u
#y = .5*X[,1]+.5*X[,2]+u
y = y-mean(y)
X = scale(X)
vX = var(X)
X = sqrt(N/(N-1))*X%*%solve(chol(vX)) # generates orthonormal matrix

reg = lm(y~X-1)
budget = .5

# where LASSO estimates should be according to Tibshirani (JRSSB 1996), eq (6)
beta_1_LASSO = max(c(0,budget/2+(reg$coefficients[1]-reg$coefficients[2])/2))
beta_2_LASSO = max(c(0,budget/2-(reg$coefficients[1]-reg$coefficients[2])/2))
beta_1_LASSO
beta_2_LASSO
beta_1_LASSO+beta_2_LASSO

# using glmnet. notice it minimizes 1/2*RSS+penalty, unlike in JRSSB, with implications for relationship betahat and lambda
lambda = (reg$coefficients[1]+reg$coefficients[2]-budget)/2
lasso.mod=glmnet(X,y,alpha=1,lambda=lambda)
coef(lasso.mod)
sum(coef(lasso.mod))

--
Prof. Dr. Christoph Hanck
Lehrstuhl für Ökonometrie
Universität Duisburg-Essen

+49 201 183 2263
christoph.hanck at vwl.uni-due.de<mailto:christoph.hanck at vwl.uni-due.de>
www.oek.wiwi.uni-due.de<http://www.oek.wiwi.uni-due.de/>

	[[alternative HTML version deleted]]