[Bioc-devel] how to minimally serialize a FilterRules object

Robert Castelo robert.castelo at upf.edu
Wed Jul 5 18:59:47 CEST 2017


dear developers,

in the framework of a package i maintain, VariantFiltering, i'm using 
the 'FilterRules' class defined in the S4Vector package and i'm 
interested in serializing (e.g., saving to disk via 'saveRDS()') 
'FilterRules' objects where some rules may defined using functions.

my problem is that the resulting RDS files take much more space than 
expected because apparently the environment of the functions is also 
serialized.

a toy example reproducing the situation could be the following:

library(S4Vectors)

## define a function that creates a ~7Mb numerical vector
## and returns a FilterRules object on a function that has
## nothing to do with this vector, except for sharing its
## environment. this tries to reproduce the situation in which
## a 'FilterRules' object is defined within the package
## 'VariantFiltering' where the environment is full of stuff
## unrelated to the 'FilterRules' object being created.

f <- function() {
   z <- rnorm(1000000)
   g <- function(x) 2*x
   fr <- FilterRules(list(g=g))
   fr
}


## call the previous function to get the FilterRules object

fr <- f()


## while the 'FilterRules' object takes 3.3 Kb ...

print(object.size(fr), units="Kb")
3.3 Kb


## ... serializing it takes ~7Mb

print(object.size(serialize(fr, NULL)), units="Mb")
7.6 Mb


i guess this is the expected behavior behind functions and environments, 
but after reading about this subject (e.g., 
http://adv-r.had.co.nz/Environments.html) i still haven't been able to 
figure out how to serialize the 'FilterRules' object without the 
associated environment or with a minimal one without unnecessary objects 
around.

i'm sure many of you will have an easy workaround for this. any help 
will be highly appreciated.


thanks!!

robert.



More information about the Bioc-devel mailing list