[R] Can glmnet handle models with numeric and categorical data?

Paul Smith phhs80 at gmail.com
Fri Aug 5 13:00:06 CEST 2011


On Fri, Aug 5, 2011 at 8:45 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
> Note the following: As soon as you use "categorical predictors",
> i.e., factors, and particularly when these have many levels (instead of just
> being binary), the resulting model matrix is often sparse,
> i.e. contains many zeros.
> When the matrix is ``really sparse',say,
>     #{zeros} / #{non-zeros} >= 10
> it can pay much to use the sparse matrices that the 'Matrix'
> package provides (you have 'Matrix' as part of your R
> installation).
>
> For exactly this reason,  'glmnet'
> has supported the use of sparse matrices for a long time,
> and we have provided the convenience function
>    sparse.model.matrix()  {package 'Matrix'}
> for easy construction of such matrices.
>
> There's also a very small extension package  'MatrixModels'
> which goes one step further, with its function
>      model.Matrix(..... sparse = TRUE/FALSE)
> but you would not need that for using the sparseMatrix in
> 'glmnet'.

Thanks, Martin. In my case, the number of potential predictors is high
and many of them are factors with 5 categories. With
sparse.model.matrix(), I am getting the following error :

«Error: C stack usage is too close to the limit.»

I realize that my sparse matrix is huge -- and the error given by
sparse.model.matrix() perfectly justified --, but I wonder whether
this problem can be overcome by having sparse.model.matrix() using
dynamic memory instead of static one.

Paul



More information about the R-help mailing list