[Rd] [patch] Support many columns in model.matrix
Karl Millar
kmillar at google.com
Mon Feb 29 19:22:51 CET 2016
Thanks.
Couldn't you implement model.matrix(..., sparse = TRUE) with a small
amount of R code similar to MatrixModels::model.Matrix ?
On Mon, Feb 29, 2016 at 10:01 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
>>>>>> Karl Millar via R-devel <r-devel at r-project.org>
>>>>>> on Fri, 26 Feb 2016 15:58:20 -0800 writes:
>
> > Generating a model matrix with very large numbers of
> > columns overflows the stack and/or runs very slowly, due
> > to the implementation of TrimRepeats().
>
> > This patch modifies it to use Rf_duplicated() to find the
> > duplicates. This makes the running time linear in the
> > number of columns and eliminates the recursive function
> > calls.
>
> Thank you, Karl.
> I've committed this (very slightly modified) to R-devel,
>
> (also after looking for a an example that runs on a non-huge
> computer and shows the difference) :
>
> nF <- 11 ; set.seed(1)
> lff <- setNames(replicate(nF, as.factor(rpois(128, 1/4)), simplify=FALSE), letters[1:nF])
> str(dd <- as.data.frame(lff)); prod(sapply(dd, nlevels))
> ## 'data.frame': 128 obs. of 11 variables:
> ## $ a: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 2 2 1 1 1 ...
> ## $ b: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 2 1 1 1 ...
> ## $ c: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 2 1 1 ...
> ## $ d: Factor w/ 3 levels "0","1","2": 1 1 2 2 1 2 1 1 2 1 ...
> ## $ e: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 2 1 ...
> ## $ f: Factor w/ 2 levels "0","1": 2 1 2 1 2 1 1 2 1 2 ...
> ## $ g: Factor w/ 4 levels "0","1","2","3": 2 1 1 2 1 3 1 1 1 1 ...
> ## $ h: Factor w/ 4 levels "0","1","2","4": 1 1 1 1 2 1 1 1 1 1 ...
> ## $ i: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ...
> ## $ j: Factor w/ 3 levels "0","1","2": 1 2 3 1 1 1 1 1 1 1 ...
> ## $ k: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
> ##
> ## [1] 139968
>
> system.time(mff <- model.matrix(~ . ^ 11, dd, contrasts = list(a = "contr.helmert")))
> ## user system elapsed
> ## 0.255 0.033 0.287 --- *with* the patch on my desktop (16 GB)
> ## 1.489 0.031 1.522 --- for R-patched (i.e. w/o the patch)
>
>> dim(mff)
> [1] 128 139968
>> object.size(mff)
> 154791504 bytes
>
> ---
>
> BTW: These example would gain tremendously if I finally got
> around to provide
>
> model.matrix(........, sparse = TRUE)
>
> which would then produce a Matrix-package sparse matrix.
>
> Even for this somewhat small case, a sparse matrix is a factor
> of 13.5 x smaller :
>
>> s1 <- object.size(mff); s2 <- object.size(M <- Matrix::Matrix(mff)); as.vector( s1/s2 )
> [1] 13.47043
>
> I'm happy to collaborate with you on adding such a (C level)
> interface to sparse matrices for this case.
>
> Martin Maechler
More information about the R-devel
mailing list