[R] Lasso for k-subset regression
Steve Lianoglou
mailinglist.honeypot at gmail.com
Mon Jun 6 16:41:34 CEST 2011
Hi,
On Sun, Jun 5, 2011 at 9:12 PM, Dae-Jin Lee <lee.daejin at gmail.com> wrote:
> Dear R-users
>
> I'm trying to use lasso in lars package for subset regression, I have a
> large matrix of size 1000x100 and my aim is to select a subset k of the 100
> variables.
>
> Is there any way in lars to fix the number k (i.e. to select the best 10
> variables)
>
> library(lars)
>
> aa=lars(X,Y,type="lasso",max.steps=200)
>
> plot(aa,plottype="Cp")
> aa$RSS
> which.min(aa$RSS)
> round(aa$beta,2)
>
> aa$beta[which.min(aa$RSS),] # find which coefficients minimizes the RSS
>
> lasso.ind=which((as.vector((aa$beta[which.min(aa$RSS),])))>0) # index of
> variables
>
> print(lasso.ind) # this usually gives more than 10 variables (also depends
> on the max.steps in lars)
First off: I'd suggest using the glmnet package instead of lars.
Setting its `alpha` parameter to 1 will give you the lasso, but you
can also play w/ different values of alpha to see if an
elasticnet-type penalty would be better.
Now that you are using glmnet, check its `dfmax` and `pmax` arguments.
HTH,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the R-help
mailing list