[Bioc-devel] how to minimally serialize a FilterRules object

Robert Castelo robert.castelo at upf.edu
Wed Jul 5 23:12:49 CEST 2017


On 05/07/2017 20:39, Martin Morgan wrote:
> On 07/05/2017 12:59 PM, Robert Castelo wrote:
>> dear developers,
>>
>> in the framework of a package i maintain, VariantFiltering, i'm using 
>> the 'FilterRules' class defined in the S4Vector package and i'm 
>> interested in serializing (e.g., saving to disk via 'saveRDS()') 
>> 'FilterRules' objects where some rules may defined using functions.
>>
>> my problem is that the resulting RDS files take much more space than 
>> expected because apparently the environment of the functions is also 
>> serialized.
>>
>> a toy example reproducing the situation could be the following:
>>
>> library(S4Vectors)
>>
>> ## define a function that creates a ~7Mb numerical vector
>> ## and returns a FilterRules object on a function that has
>> ## nothing to do with this vector, except for sharing its
>> ## environment. this tries to reproduce the situation in which
>> ## a 'FilterRules' object is defined within the package
>> ## 'VariantFiltering' where the environment is full of stuff
>> ## unrelated to the 'FilterRules' object being created.
>>
>> f <- function() {
>>    z <- rnorm(1000000)
>>    g <- function(x) 2*x
>
> I guess
>
>     g <- function(x) 2 * x > 10
>
> or similar would satisfy the requirements of FilterRules to return an 
> equal-lengthed logical vector
>
>
oops, yes of course.

>>    fr <- FilterRules(list(g=g))
>>    fr
>> }
>>
>>
>> ## call the previous function to get the FilterRules object
>>
>> fr <- f()
>>
>>
>> ## while the 'FilterRules' object takes 3.3 Kb ...
>>
>> print(object.size(fr), units="Kb")
>> 3.3 Kb
>>
>>
>> ## ... serializing it takes ~7Mb
>>
>> print(object.size(serialize(fr, NULL)), units="Mb")
>> 7.6 Mb
>>
>
> I added the test case
>
>   testthat::expect_equal(eval(fr, 1:10), rep(c(FALSE, TRUE), each=5))
>
but then

g <- function(x) x > 10

which is good for simplicity

>> i guess this is the expected behavior behind functions and 
>> environments, but after reading about this subject (e.g., 
>> http://adv-r.had.co.nz/Environments.html) i still haven't been able 
>> to figure out how to serialize the 'FilterRules' object without the 
>> associated environment or with a minimal one without unnecessary 
>> objects around.
>>
>> i'm sure many of you will have an easy workaround for this. any help 
>> will be highly appreciated.
>
> One possibility is to set the environment of g() to something that 
> resolves appropriate symbols, e.g.,
>
> f <- function() {
>     z <- rnorm(1000000)
>     g <- function(x) 2 * x > 5
>     environment(g) <- baseenv()
>     FilterRules(list(g=g))
> }
>
> the serialized size is then 11 kb and the test continues to pass. The 
> environment needs to be baseenv to resolve `*` and `>`; emptyenv() is 
> too restrictive. A package name space might often be appropriate 
> (though maybe large).
>
> Maybe that's a Hack, and Michael or others will chime in with 
> something better...
>
thanks!! indeed this reduces the size down to 1 kb:

f <- function() {
   z <- rnorm(1000000)
   g <- function(x) x > 5
   environment(g) <- baseenv()
   fr <- FilterRules(list(g=g))
   fr
}

fr <- f()
testthat::expect_equal(eval(fr, 1:10), rep(c(FALSE, TRUE), each=5))

print(object.size(fr), units="Kb")
1Kb
print(object.size(serialize(fr, NULL)), units="Kb")
1Kb

how would set the environment of the function to a package namespace?

wouldn't make more sense to leave it with baseenv() and call 
'require(pkg)' within the function to load whatever the function needs 
from package 'pkg'?

robert.

> Martin
>
>>
>>
>> thanks!!
>>
>> robert.
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> This email message may contain legally privileged and/or confidential 
> information.  If you are not the intended recipient(s), or the 
> employee or agent responsible for the delivery of this message to the 
> intended recipient(s), you are hereby notified that any disclosure, 
> copying, distribution, or use of this email message is prohibited.  If 
> you have received this message in error, please notify the sender 
> immediately by e-mail and delete this email message from your 
> computer. Thank you.



More information about the Bioc-devel mailing list