[R] save/load doubles memory [oops]
Ross Boylan
ross at biostat.ucsf.edu
Tue Sep 17 21:19:50 CEST 2013
On Tue, 2013-09-17 at 12:06 -0700, Ross Boylan wrote:
> Saving and loading data is roughly doubling memory use. I'm trying to
> understand and correct the problem.
Apparently I had the process memories mixed up: R1 below was the one
with 4G and R2 with 2G. So there's less of a mystery. However...
>
> R1 was an R process using just over 2G of memory.
> I did save(r3b, r4, sflist, file="r4.rdata")
> and then, in a new process R2,
> load(file="r4.rdata")
>
> R2 used just under 4G of memory, i.e., almost double the original
> process. The r4.rdata file was just under 2G, which seemed like very
> little compression.
>
> r4 was created by
> r4 <- sflist2stanfit(sflist)
>
> I presume that r4 and sflist shared most of their memory.
> The save() apparently lost the information that the memory was shared,
> doubling memory use.
Still wondering if this is going on.
>
> R 2.15.1, 64 bit on linux.
>
> First, does my diagnosis sound right? The reports of memory use in R2
> are quite a bit lower than the process footprint; is that normal?
> > gc() # after loading data
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1988691 106.3 3094291 165.3 2432643 130.0
> Vcells 266976864 2036.9 282174979 2152.9 268661172 2049.8
> > rm("r4")
> > gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1949626 104.2 3094291 165.3 2432643 130.0
> Vcells 190689777 1454.9 282174979 2152.9 268661172 2049.8
> > r4 <- sflist2stanfit(sflist)
> > gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1970497 105.3 3094291 165.3 2432643 130.0
> Vcells 228827252 1745.9 296363727 2261.1 268661172 2049.8
> >
It seems the recreated r4 used about 300M less memory than the one read
in from disk. This suggests that some of the sharing was lost in the
save/load process.
>
> Even weirder, R1 reports memory use well beyond the memory I show the
> process using (2.2G)
Not a mystery after getting the right processes. Actually, I'm a little
surprised the process memory is less than the max used memory; I thought
giving back memory was not possible on Linux.
> > gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 3640941 194.5 5543382 296.1 5543382 296.1
> Vcells 418720281 3194.6 553125025 4220.1 526708090 4018.5
>
>
> Second, what can I do to avoid the problem?
Now a more modest problem, though still a problem.
>
> I guess in this case I could not save r4 and recreate it, but is there a
> more general solution?
>
> If I did myboth <- list(r4, sflist) and
> save(myboth, file="myfile")
> would that be enough to keep the objects together? Judging from the
> size of the file, it seems not.
>
> Even if the myboth trick worked it seems like a kludge.
>
> Ross Boylan
>
More information about the R-help
mailing list