[Bioc-devel] how to minimally serialize a FilterRules object

Martin Morgan martin.morgan at roswellpark.org
Wed Jul 5 20:39:20 CEST 2017


On 07/05/2017 12:59 PM, Robert Castelo wrote:
> dear developers,
> 
> in the framework of a package i maintain, VariantFiltering, i'm using 
> the 'FilterRules' class defined in the S4Vector package and i'm 
> interested in serializing (e.g., saving to disk via 'saveRDS()') 
> 'FilterRules' objects where some rules may defined using functions.
> 
> my problem is that the resulting RDS files take much more space than 
> expected because apparently the environment of the functions is also 
> serialized.
> 
> a toy example reproducing the situation could be the following:
> 
> library(S4Vectors)
> 
> ## define a function that creates a ~7Mb numerical vector
> ## and returns a FilterRules object on a function that has
> ## nothing to do with this vector, except for sharing its
> ## environment. this tries to reproduce the situation in which
> ## a 'FilterRules' object is defined within the package
> ## 'VariantFiltering' where the environment is full of stuff
> ## unrelated to the 'FilterRules' object being created.
> 
> f <- function() {
>    z <- rnorm(1000000)
>    g <- function(x) 2*x

I guess

     g <- function(x) 2 * x > 10

or similar would satisfy the requirements of FilterRules to return an 
equal-lengthed logical vector


>    fr <- FilterRules(list(g=g))
>    fr
> }
> 
> 
> ## call the previous function to get the FilterRules object
> 
> fr <- f()
> 
> 
> ## while the 'FilterRules' object takes 3.3 Kb ...
> 
> print(object.size(fr), units="Kb")
> 3.3 Kb
> 
> 
> ## ... serializing it takes ~7Mb
> 
> print(object.size(serialize(fr, NULL)), units="Mb")
> 7.6 Mb
> 

I added the test case

   testthat::expect_equal(eval(fr, 1:10), rep(c(FALSE, TRUE), each=5))

> i guess this is the expected behavior behind functions and environments, 
> but after reading about this subject (e.g., 
> http://adv-r.had.co.nz/Environments.html) i still haven't been able to 
> figure out how to serialize the 'FilterRules' object without the 
> associated environment or with a minimal one without unnecessary objects 
> around.
> 
> i'm sure many of you will have an easy workaround for this. any help 
> will be highly appreciated.

One possibility is to set the environment of g() to something that 
resolves appropriate symbols, e.g.,

f <- function() {
     z <- rnorm(1000000)
     g <- function(x) 2 * x > 5
     environment(g) <- baseenv()
     FilterRules(list(g=g))
}

the serialized size is then 11 kb and the test continues to pass. The 
environment needs to be baseenv to resolve `*` and `>`; emptyenv() is 
too restrictive. A package name space might often be appropriate (though 
maybe large).

Maybe that's a Hack, and Michael or others will chime in with something 
better...

Martin

> 
> 
> thanks!!
> 
> robert.
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or...{{dropped:2}}



More information about the Bioc-devel mailing list