[R] Can glmnet handle models with numeric and categorical data?
Martin Maechler
maechler at stat.math.ethz.ch
Fri Aug 5 09:45:27 CEST 2011
>>>>> "PS" == Paul Smith <phhs80 at gmail.com>
>>>>> on Fri, 5 Aug 2011 00:30:59 +0100 writes:
PS> On Fri, Aug 5, 2011 at 12:02 AM, Marc Schwartz
PS> <marc_schwartz at me.com> wrote:
>>> Can the x matrix in the glmnet() function of glmnet
>>> package be a data.frame with numeric columns and factor
>>> columns? I am asking this because I have a model with
>>> both numeric and categorical predictors, which I would
>>> like to study with glmnet. I have already tried to use a
>>> data.frame, but with no success -- as far as I know, the
>>> matrix object can only have data of a single type. Is
>>> there some way of circumventing this problem?
>>
>> My recollection is that you would use ?model.matrix on
>> the data frame to create the requisite matrix input for
>> glmnet().
>>
>> The caution however, is that glmnet() standardizes the
>> input covariates, which is not appropriate for
>> factors. Thus, you would want to set 'standardize =
>> FALSE' and use appropriate methods in pre-processing
>> continuous variables.
PS> Again, Mark, thanks a lot for your so helpful answer --
PS> I completely ignored model.matrix().
Note the following: As soon as you use "categorical predictors",
i.e., factors, and particularly when these have many levels (instead of just
being binary), the resulting model matrix is often sparse,
i.e. contains many zeros.
When the matrix is ``really sparse',say,
#{zeros} / #{non-zeros} >= 10
it can pay much to use the sparse matrices that the 'Matrix'
package provides (you have 'Matrix' as part of your R
installation).
For exactly this reason, 'glmnet'
has supported the use of sparse matrices for a long time,
and we have provided the convenience function
sparse.model.matrix() {package 'Matrix'}
for easy construction of such matrices.
There's also a very small extension package 'MatrixModels'
which goes one step further, with its function
model.Matrix(..... sparse = TRUE/FALSE)
but you would not need that for using the sparseMatrix in
'glmnet'.
--
Martin Maechler, ETH Zurich
More information about the R-help
mailing list