[Rd] reshape() makes R run out of memory (PR#14121)

hadley wickham h.wickham at gmail.com
Thu Dec 10 01:10:45 CET 2009


> Yes. The culprit would seem to be interaction(), as in
>
>> x <- y <- z <- 1:999
>> i <- interaction(x,y,z, drop=TRUE)
> Error: cannot allocate vector of size 3.7 Gb
>
> which is happening due to the occurrence of three idvar variables. This
> works basically as interaction(x,y,z)[,drop=TRUE], i.e. it first creates a
> factor with 999^3 levels, and removes the empty levels afterward.
>
> In the absense of a better interaction(), you might try making your own
> single idvar as do.call("paste",tbl[,c("ID", "DATE1", "DATE2")]) or so.

There's also ninteraction in the plyr package, which has been designed
to generate a unique integer for each combination (while maintaining
the original order of the data and any missing combinations) as
efficiently as possible.  It's much faster than interaction(..., drop
= T) and I hope it would be faster than paste since it works with
integers rather than strings.

Hadley

-- 
http://had.co.nz/



More information about the R-devel mailing list