[Rd] [External] memory consumption of nested (un)serialize of sys.frames()

iuke-tier@ey m@iii@g oii uiow@@edu iuke-tier@ey m@iii@g oii uiow@@edu
Wed Apr 7 16:39:09 CEST 2021


No issues here with that either. Looks like something is different on
your end.

Best,

luke

On Wed, 7 Apr 2021, Andreas Kersting wrote:

> Hi Luke,
>
> Please see https://github.com/akersting/dumpTest for the package.
>
> Here a session showing my issue:
>
>> library(dumpTest)
>> sessionInfo()
> R version 4.0.5 (2021-03-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Debian GNU/Linux 10 (buster)
>
> Matrix products: default
> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] dumpTest_0.1.0
>
> loaded via a namespace (and not attached):
> [1] compiler_4.0.5
>> for (i in 1:100) {
> +   print(i)
> +   print(system.time(f()))
> + }
> [1] 1
>   user  system elapsed
>  0.028   0.004   0.034
> [1] 2
>   user  system elapsed
>  0.067   0.008   0.075
> [1] 3
>   user  system elapsed
>  0.176   0.000   0.176
> [1] 4
>   user  system elapsed
>  0.335   0.012   0.349
> [1] 5
>   user  system elapsed
>  0.745   0.023   0.770
> [1] 6
>   user  system elapsed
>  1.495   0.060   1.572
> [1] 7
>   user  system elapsed
>  2.902   0.136   3.040
> [1] 8
>   user  system elapsed
>  5.753   0.272   6.034
> [1] 9
>   user  system elapsed
> 11.807   0.708  12.597
> [1] 10
> ^C
> Timing stopped at: 6.638 0.549 7.214
>
> I had to interrupt in iteration 10 because I was running low on RAM.
>
> Regards,
> Andreas
>
> 2021-04-07 15:28 GMT+02:00 luke-tierney using uiowa.edu:
>> On Wed, 7 Apr 2021, Andreas Kersting wrote:
>>
>>> Hi,
>>>
>>> please consider the following minimal reproducible example:
>>>
>>> Create a new R package which just contains the following two (exported) objects:
>>
>> I would not expect this behavior and I don't see it when I make such a
>> package (in R 4.0.3 or R-devel on Ubuntu).  You will need to provide a
>> more complete reproducible example if you want help with what you are
>> trying to do; also sessionInfo() would help.
>>
>> Best,
>>
>> luke
>>
>>>
>>>
>>> crash_dumps <- new.env()
>>>
>>> f <- function() {
>>>  x <- runif(1e5)
>>>  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
>>>  assign("last.dump", dump, crash_dumps)
>>> }
>>>
>>>
>>> WARNING: the following will probably eat all your RAM!
>>>
>>> Attach this package and run:
>>>
>>> for (i in 1:100) {
>>>  print(i)
>>>  f()
>>> }
>>>
>>> You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes.
>>>
>>> I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks!
>>>
>>> Regards,
>>> Andreas
>>>
>>>
>>> Background:
>>> In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory.
>>>
>>> Some more observations:
>>> - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced.
>>> - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!?
>>> - If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env.
>>> - If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround:
>>>
>>> crash_dumps <- list()
>>>
>>> f <- function() {
>>>  x <- runif(1e5)
>>>  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
>>>  crash_dumps[["last.dump"]] <- dump
>>>  assignInMyNamespace("crash_dumps", crash_dumps)
>>> }
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa                  Phone:             319-335-3386
>> Department of Statistics and        Fax:               319-335-3017
>>    Actuarial Science
>> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
>> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-devel mailing list