[Rd] Reduce memory peak when serializing to raw vectors
Simon Urbanek
simon.urbanek at r-project.org
Tue Mar 17 22:03:05 CET 2015
Jorge,
what you propose is not possible because the size of the output is unknown, that's why a dynamically growing PStream buffer is used - it cannot be pre-allocated.
Cheers,
Simon
> On Mar 17, 2015, at 1:37 PM, Martinez de Salinas, Jorge <jorge.martinez-de-salinas at hp.com> wrote:
>
> Hi,
>
> I've been doing some tests using serialize() to a raw vector:
>
> df <- data.frame(runif(50e6,1,10))
> ser <- serialize(df,NULL)
>
> In this example the data frame and the serialized raw vector occupy ~400MB each, for a total of ~800M. However the memory peak during serialize() is ~1.2GB:
>
> $ cat /proc/15155/status |grep Vm
> ...
> VmHWM: 1207792 kB
> VmRSS: 817272 kB
>
> We work with very large data frames and in many cases this is killing R with an "out of memory" error.
>
> This is the relevant code in R 3.1.3 in src/main/serialize.c:2494
>
> InitMemOutPStream(&out, &mbs, type, version, hook, fun);
> R_Serialize(object, &out);
> val = CloseMemOutPStream(&out);
>
> The serialized object is being stored in a buffer pointed by out.data. Then in CloseMemOutPStream() R copies the whole buffer to a newly allocated SEXP object (the raw vector that stores the final result):
>
> PROTECT(val = allocVector(RAWSXP, mb->count));
> memcpy(RAW(val), mb->buf, mb->count);
> free_mem_buffer(mb);
> UNPROTECT(1);
>
> Before calling free_mem_buffer() the process is using ~1.2GB (the original data frame + the serialization buffer + final serialized raw vector).
>
> One possible solution would be to allocate a buffer for the final raw vector and store the serialization result directly into that buffer. This would bring the memory peak down from ~1.2GB to ~800MB.
>
> Thanks,
> -Jorge
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list