[Rd] Memory leak with tons of closed connections
Martin Maechler
maechler at stat.math.ethz.ch
Fri Nov 11 12:08:05 CET 2016
>>>>> Gergely Daróczi <daroczig at rapporter.net>
>>>>> on Thu, 10 Nov 2016 16:48:12 +0100 writes:
> Dear All,
> I'm developing an R application running inside of a Java daemon on
> multiple threads, and interacting with the parent daemon via stdin and
> stdout.
> Everything works perfectly fine except for having some memory leaks
> somewhere. Simplified version of the R app:
> while (TRUE) {
> con <- file('stdin', open = 'r', blocking = TRUE)
> line <- scan(con, what = character(0), nlines = 1, quiet = TRUE)
> close(con)
> }
> This loop uses more and more RAM as time passes (see more on this
> below), not sure why, and I have no idea currently on how to debug
> this further. Can someone please try to reproduce it and give me some
> hints on what is the problem?
> Sample bash script to trigger an R process with such memory leak:
> Rscript --vanilla -e "while(TRUE)cat(runif(1),'\n')" | Rscript
> --vanilla -e "cat(Sys.getpid(),'\n');while(TRUE){con<-file('stdin',open='r',blocking=TRUE);line<-scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);gc()}"
> Maybe you have to escape '\n' depending on your shell.
> Thanks for reading this and any hints would be highly appreciated!
I have no hints, sorry... but give some more "data":
I've changed the above to *print* the gc() result every 1000th
iteration, and after 100'000 iterations, there is still no
memory increase from the point of view of R itself.
However, monitoring the process (via 'htop', e.g.) shows about
1 MB per second increase in memory foot print of the process.
One could argue that the error is with the OS / pipe / bash
rather than with R itself... but I'm not expert enough to do
argue here at all.
Here's my version of your sample bash script and its output:
$ Rscript --vanilla -e "while(TRUE)cat(runif(1),'\n')" | Rscript --vanilla -e "cat(Sys.getpid(),'\n');i <- 0; while(TRUE){con<-file('stdin',open='r',blocking=TRUE);line<-scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);a <- gc(); i <- i+1; if(i %% 1000 == 1) {cat('i=',i,'\\n'); print(a)} }"
11059
i= 1
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 83216 4.5 10000000 534.1 213529 11.5
Vcells 172923 1.4 16777216 128.0 562476 4.3
i= 1001
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 83255 4.5 10000000 534.1 213529 11.5
Vcells 172958 1.4 16777216 128.0 562476 4.3
.......
...............................................
...............................................
...............................................
i= 80001
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 83255 4.5 10000000 534.1 213529 11.5
Vcells 172958 1.4 16777216 128.0 562476 4.3
i= 81001
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 83255 4.5 10000000 534.1 213529 11.5
Vcells 172959 1.4 16777216 128.0 562476 4.3
i= 82001
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 83255 4.5 10000000 534.1 213529 11.5
Vcells 172959 1.4 16777216 128.0 562476 4.3
i= 83001
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 83255 4.5 10000000 534.1 213529 11.5
Vcells 172958 1.4 16777216 128.0 562476 4.3
i= 84001
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 83255 4.5 10000000 534.1 213529 11.5
Vcells 172958 1.4 16777216 128.0 562476 4.3
> Best,
> Gergely
> PS1 see the image posted at
> http://stackoverflow.com/questions/40522584/memory-leak-with-closed-connections
> on memory usage over time
> PS2 the issue doesn't seem to be due to writing more data in the first
> R app compared to what the second R app can handle, as I tried the
> same with adding a Sys.sleep(0.01) in the first app and that's not an
> issue at all in the real application
> PS3 I also tried using stdin() instead of file('stdin'), but that did
> not work well for the stream running on multiple threads started by
> the same parent Java daemon
> PS4 I've tried this on Linux using R 3.2.3 and 3.3.2
For me, it's Linux, too (Fedora 24), using 'R 3.3.2 patched'..
Martin
More information about the R-devel
mailing list