[R] Lasso for k-subset regression

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Jun 6 16:41:34 CEST 2011


Hi,

On Sun, Jun 5, 2011 at 9:12 PM, Dae-Jin Lee <lee.daejin at gmail.com> wrote:
> Dear R-users
>
> I'm trying to use lasso in lars package for subset regression,  I have a
> large matrix of size 1000x100 and my aim is to select a subset k of the 100
> variables.
>
> Is there any way in lars to fix the number k (i.e. to select the best 10
> variables)
>
> library(lars)
>
> aa=lars(X,Y,type="lasso",max.steps=200)
>
> plot(aa,plottype="Cp")
> aa$RSS
> which.min(aa$RSS)
> round(aa$beta,2)
>
> aa$beta[which.min(aa$RSS),]    #  find which coefficients minimizes the RSS
>
> lasso.ind=which((as.vector((aa$beta[which.min(aa$RSS),])))>0)    # index of
> variables
>
> print(lasso.ind)   # this usually gives more than 10 variables (also depends
> on the max.steps in lars)

First off: I'd suggest using the glmnet package instead of lars.
Setting its `alpha` parameter to 1 will give you the lasso, but you
can also play w/ different values of alpha to see if an
elasticnet-type penalty would be better.

Now that you are using glmnet, check its `dfmax` and `pmax` arguments.

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list