[Bioc-devel] how to minimally serialize a FilterRules object
Robert Castelo
robert.castelo at upf.edu
Wed Jul 5 18:59:47 CEST 2017
dear developers,
in the framework of a package i maintain, VariantFiltering, i'm using
the 'FilterRules' class defined in the S4Vector package and i'm
interested in serializing (e.g., saving to disk via 'saveRDS()')
'FilterRules' objects where some rules may defined using functions.
my problem is that the resulting RDS files take much more space than
expected because apparently the environment of the functions is also
serialized.
a toy example reproducing the situation could be the following:
library(S4Vectors)
## define a function that creates a ~7Mb numerical vector
## and returns a FilterRules object on a function that has
## nothing to do with this vector, except for sharing its
## environment. this tries to reproduce the situation in which
## a 'FilterRules' object is defined within the package
## 'VariantFiltering' where the environment is full of stuff
## unrelated to the 'FilterRules' object being created.
f <- function() {
z <- rnorm(1000000)
g <- function(x) 2*x
fr <- FilterRules(list(g=g))
fr
}
## call the previous function to get the FilterRules object
fr <- f()
## while the 'FilterRules' object takes 3.3 Kb ...
print(object.size(fr), units="Kb")
3.3 Kb
## ... serializing it takes ~7Mb
print(object.size(serialize(fr, NULL)), units="Mb")
7.6 Mb
i guess this is the expected behavior behind functions and environments,
but after reading about this subject (e.g.,
http://adv-r.had.co.nz/Environments.html) i still haven't been able to
figure out how to serialize the 'FilterRules' object without the
associated environment or with a minimal one without unnecessary objects
around.
i'm sure many of you will have an easy workaround for this. any help
will be highly appreciated.
thanks!!
robert.
More information about the Bioc-devel
mailing list