[Rd] reshape() makes R run out of memory (PR#14121)
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Dec 9 21:45:00 CET 2009
abelikoff at gmail.com wrote:
> Full_Name: Alexander L. Belikoff
> Version: 2.8.1
> OS: Ubuntu 9.04 (x86_64)
> Submission from: (NULL) (67.244.71.200)
>
>
> I'm trying to reshape the following data frame:
>
> ID DATE1 DATE2 VALUE_TYPE VALUE
> 'abcd1233' 2009-11-12 2009-12-23 'TYPE1' 123.45
> ...
>
> VALUE_TYPE is a string and is a factor with only 2 values (say TYPE1 and TYPE2).
> I need to transform it into the following data frame ("wide" transpose) based on
> common ID and DATEs:
>
> ID DATE1 DATE2 VALUE.TYPE1 VALUE.TYPE2
> 'abcd1233' 2009-11-12 2009-12-23 123.45 NA
> ...
>
> Using stock reshape() as follows:
>
> tbl2 <- reshape(tbl, direction = "wide", idvar = c("ID", "DATE1", "DATE2"),
> timevar = "VALUE_TYPE");
>
> On a toy data frame this works fine. On a real one with 4.7 million entries
> (although about 70% of VALUEs are NA) it runs out of memory:
>
> Error: cannot allocate vector of size 4.8 Gb
>
> When the real data frame is loaded the R process takes about 200Mb of virtual
> memory. The machine has 4 Gb of RAM.
>
> I've posted a .Rdata file with the data frame in question at
> http://belikoff.net/stuff/other/reshape_test.Rdata.gz
>
>
> P.S. Just checked R 2.10.0 using an Intel PC with 2Gb RAM running Xp Pro (32
> bit):
>
>> tbl2 <- reshape(tbl, direction = "wide", idvar = c("ID", "DATE1", "DATE2")c("ID", "DATE1", "DATE2"),
> timevar = "VALUE_TYPE");
> Error: cannot allocate vector of size 53.9 Mb
> In addition: Warning messages:
....
Yes. The culprit would seem to be interaction(), as in
> x <- y <- z <- 1:999
> i <- interaction(x,y,z, drop=TRUE)
Error: cannot allocate vector of size 3.7 Gb
which is happening due to the occurrence of three idvar variables. This
works basically as interaction(x,y,z)[,drop=TRUE], i.e. it first creates
a factor with 999^3 levels, and removes the empty levels afterward.
In the absense of a better interaction(), you might try making your own
single idvar as do.call("paste",tbl[,c("ID", "DATE1", "DATE2")]) or so.
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-devel
mailing list