[R] Questions regrading the lasso and glmnet

Patrick Breheny patrick.breheny at uky.edu
Sun May 29 12:41:36 CEST 2011


On 05/28/2011 12:54 PM, Ben Haller wrote:
> 1. Is my choice of glmnet() ok?  On what basis should I choose
> glmnet() vs. lars()?

LARS is for linear regression; your outcome is binary.

> 2. Is the way I'm scaling the variables before calling glmnet()
> correct?  Or should the squares themselves be centered and scaled?

> 3. Is my model matrix correct, or do I have a problem with the scale
> of the interaction variables?

glmnet centers and scales the variables itself.  You do not need to do so.

> 4. Is it a problem that the lasso fit gives non-zero coefficients for
> interactions whose underlying terms have zero coefficients?

This is going to occur with any automated model selection procedure 
unless specifically disallowed.

> 5. Is there any way to choose a simple explanatory model, smaller
> than the best predictive model supported by the data, that is less
> arbitrary / subjective?

You have 5 variables.  Variable selection is not your goal.  What you 
are trying to do is fit a curve (as opposed to a line) through your 
data, along possibly with interactions.  I would suggest looking into 
splines, provided for example in the mgcv package.

-- 
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky



More information about the R-help mailing list