[R-pkg-devel] Unused data is silently kept in the environment of a function
Samuel Granjeaud
@@mue|@gr@nje@ud @end|ng |rom |n@erm@|r
Fri Jul 8 15:50:07 CEST 2022
Dear all,
I want to compute processing functions to apply to the data.
I apply the functions to the data in a second step.
proc_0 increases the memory, proc_1 is safe.
reprex below.
If this behavior is known, could you tell me a workaround before I try
to guess the best one?
Best,
Samuel
``` r
# for memory tracking
library(pryr)
# a class
setClass(
"fb",
slots = list(d = "numeric", f = "list"),
prototype=list(d = NULL, f = NULL)
)
# memory increased: keep dat somewhere and link it back to the returned
value
proc_0 <- function(x) {
dat = sample(x using d)
cofactors = c(mean(dat), median(dat), IQR(dat))
model = sapply(cofactors, function(cofactor) function(z) z / cofactor)
x using f = list(model)
x
}
# init data
mem_used()
#> 47 MB
a = new("fb")
a using d = sample(rnorm(1e7))
a using f = list()
mem_used()
#> 127 MB
# memory increased of 80 MB
# process
b = proc_0(a)
mem_used()
#> 207 MB
# memory increased of 80 MB again
rm(a)
mem_used()
#> 207 MB
# memory didn't decreased
b using d = b using d + 1
mem_used()
#> 287 MB
# memory increased
# b using d was really pointing to a using d before increment
sapply(1:3, function(i) ls(environment(b using f[[1]][[i]])))
#> [1] "cofactor" "cofactor" "cofactor"
sapply(1:3, function(i) get("cofactor", environment(b using f[[1]][[i]])))
#> [1] -0.0003085559 0.0001107148 1.3485980291
# environments look fine
rm(b)
mem_used()
#> 47.5 MB
# memory released back
# memory safe
proc_1 <- function(x) {
cofactors = c(mean(x using d), median(x using d), IQR(x using d))
model = sapply(cofactors, function(cofactor) function(z) z / cofactor)
x using f = list(model)
x
}
# init data
mem_used()
#> 47.5 MB
a = new("fb")
a using d = sample(rnorm(1e7))
a using f = list()
mem_used()
#> 128 MB
b = proc_1(a)
mem_used()
#> 128 MB
# memory didn't increased; b using d points to a using d; functions weight a few KB
rm(a)
mem_used()
#> 128 MB
sapply(1:3, function(i) ls(environment(b using f[[1]][[i]])))
#> [1] "cofactor" "cofactor" "cofactor"
sapply(1:3, function(i) get("cofactor", environment(b using f[[1]][[i]])))
#> [1] -0.0003133312 -0.0002510665 1.3491459433
rm(b)
mem_used()
#> 47.5 MB
```
<sup>Created on 2022-07-08 by the [reprex
package](https://reprex.tidyverse.org) (v2.0.1)</sup>
<details style="margin-bottom:10px;">
<summary>
Session info
</summary>
``` r
sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=French_France.utf8 LC_CTYPE=French_France.utf8
#> [3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C
#> [5] LC_TIME=French_France.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] pryr_0.1.5
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.8.3 codetools_0.2-18 digest_0.6.29 withr_2.5.0
#> [5] magrittr_2.0.3 reprex_2.0.1 evaluate_0.15 highr_0.9
#> [9] stringi_1.7.6 rlang_1.0.3 cli_3.3.0 rstudioapi_0.13
#> [13] fs_1.5.2 lobstr_1.1.2 rmarkdown_2.14 tools_4.2.1
#> [17] stringr_1.4.0 glue_1.6.2 xfun_0.31 yaml_2.3.5
#> [21] fastmap_1.1.0 compiler_4.2.1 htmltools_0.5.2 knitr_1.39
```
</details>
More information about the R-package-devel
mailing list