[R] Ridge Regression variable selection

Ben Bolker bbolker at gmail.com
Thu Dec 27 16:14:39 CET 2012


Frank Harrell <f.harrell <at> vanderbilt.edu> writes:

> 
> Unlike L1 (lasso) regression or elastic net (mixture of L1 and L2), L2 norm
> regression (ridge regression) does not select variables.  Selection of
> variables would not work properly, and it's unclear why you would want to
> omit "apparently" weak variables anyway.
> Frank
> 

  ... and this was cross-posted from StackOverflow, where I said more
or less the same thing about ridge regression (I didn't get into the
"don't do variable selection" issue yet, I was waiting ...)

http://stackoverflow.com/questions/14046569/ridge-regression-in-r

  For the other questions (what are the lambda values?  What does
the output mean?) I would suggest getting a copy of _Modern
Applied Statistics in S_ [the book that the package, MASS, was
written to accompany] and reading the relevant chapter.

> maths123 wrote
> > I have a .txt file containing a dataset with 500 samples. There are 10
> > variables.
> > 
> > I am trying to perform variable selection using the ridge regression
> > method but I am very confused. 
> > 
> > I have input the following:
> > diabetes10<-read.table("diabetes10.txt", header=TRUE)
> > diabetes10
> > library(MASS)
> > select(lm.ridge(y=diabetes10 ~ age+sex+bmi+map+tc , diabetes10,
> >                lambda = seq(0,0.1,0.0001)))
> > 
> > First of all, i am confused about the lamda values,
> > Second of all, my output is:
> > 
> > modified HKB estimator is -1.334073e-29 
> > modified L-W estimator is -5.610557e-28 
> > smallest value of GCV  at 1e-04 
> > 
> > 
> > I have no idea what that is telling me and where I am supposed to work out
> > which variables have been selected.
> >




More information about the R-help mailing list