[R] Behaviour of dfmax in glmnet

Wed Feb 27 23:56:11 CET 2019

Hi,

I am new to <i>glmnet</i>, so I do not yet understand fully what the various

parameters do. I am trying to build a multinomial classifier which restricts

the number of features used in the model. From reading the docs and some

answers on this forum, I understand <i>dfmax</i> is the way to do it. I
played

around with it a bit; I have a couple of questions and would appreciate some

help:

<h3>Setup</h3>

For a particular dataset, I want to restrict the number of features to 3;

the original data has 126 features. Here's what I run:

fit<-glmnet(data.matrix(X), data.matrix(y), family='multinomial', dfmax=3)

d<-data.frame(tidy(fit))

This is the value of <i>d</i> (inserting a screenshot since the table
columns get

disturbed by the formatting):

My questions about the output:

[1] I see multiple values of <i>lambda</i> in there; it looks like glmnet
tries

to fit lambdas that gets the number of terms close to dfmax=3. So its less

like the LARs algorithm (in the sense that we don't move stagewise by adding

variables) and more about getting the right lambdas for regularization that

lead to the intended dfmax. Is this right?

[2] I'm guessing alpha plays a role in how close we can get to dfmax. At

alpha=1, where we're doing lasso, and so its easier to get close to dfmax,

compared to when alpha=0 and we're doing ridge. Is this understanding

correct?

[3] A "neighborhood" of dfmax is the best we can do it'd seem. Or am I

missing a parameter that gets me to the model with the exact dfmax (fyi:

alpha=1 doesn't seem to get me to the precise number of non zero terms

either, at least on this dataset).

[4] what does pmax do?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dfmax.PNG
Type: image/png
Size: 54147 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20190227/92b3a1a1/attachment.png>