[R] Behaviour of dfmax in glmnet
Abhishek Ghose
@bh|@hek@gho@e@82 @end|ng |rom gm@||@com
Wed Feb 27 23:56:11 CET 2019
Hi,
I am new to <i>glmnet</i>, so I do not yet understand fully what the various
parameters do. I am trying to build a multinomial classifier which restricts
the number of features used in the model. From reading the docs and some
answers on this forum, I understand <i>dfmax</i> is the way to do it. I
played
around with it a bit; I have a couple of questions and would appreciate some
help:
<h3>Setup</h3>
For a particular dataset, I want to restrict the number of features to 3;
the original data has 126 features. Here's what I run:
fit<-glmnet(data.matrix(X), data.matrix(y), family='multinomial', dfmax=3)
d<-data.frame(tidy(fit))
This is the value of <i>d</i> (inserting a screenshot since the table
columns get
disturbed by the formatting):
My questions about the output:
[1] I see multiple values of <i>lambda</i> in there; it looks like glmnet
tries
to fit lambdas that gets the number of terms close to dfmax=3. So its less
like the LARs algorithm (in the sense that we don't move stagewise by adding
variables) and more about getting the right lambdas for regularization that
lead to the intended dfmax. Is this right?
[2] I'm guessing alpha plays a role in how close we can get to dfmax. At
alpha=1, where we're doing lasso, and so its easier to get close to dfmax,
compared to when alpha=0 and we're doing ridge. Is this understanding
correct?
[3] A "neighborhood" of dfmax is the best we can do it'd seem. Or am I
missing a parameter that gets me to the model with the exact dfmax (fyi:
alpha=1 doesn't seem to get me to the precise number of non zero terms
either, at least on this dataset).
[4] what does pmax do?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dfmax.PNG
Type: image/png
Size: 54147 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20190227/92b3a1a1/attachment.png>
More information about the R-help
mailing list