[Bioc-devel] how to minimally serialize a FilterRules object

Vincent Carey stvjc at channing.harvard.edu
Wed Jul 5 19:44:05 CEST 2017


Interesting.  I am finding that fr at listData$g is extremely slow and returns
z in a peculiar format

> fr at listData$g

filter (fr = <S4 object of class "FilterRules">, g = c("function (x) ", "2
* x"), z = c("c(2.25030343990823, 0.0689130508451947, 0.898164844240903,
-0.269136190861786, ", "0.286806384719545, -0.61182037109871,
0.653056951088029, -1.48136241526067, ", "-1.06150631982182,
0.988130188095894, 0.245748890666696, 0.203625470891459, ",
"0.563198830428634, 1.16048203861639, 0.116128059607538,
-0.949976682548964, ", "0.590987242729504, 1.56586236379949,
2.65190924918386, 0.395066113147369, ", "1.14339356797857,
-1.38492856542597, -0.309354770689183, -0.678645873097042, ",
"-1.45826853611657, 0.40146829174388, 1.78560892418892, -0.652872116524565,
",

"-

> fr <- f()

> fr

FilterRules of length 1

names(1): g

> fr at listData

$g

function(x) 2*x

<environment: 0x3280a20>

attr(,"class")

[1] "FilterClosure"

attr(,"class")attr(,"package")

[1] "S4Vectors"


fr at listData$g

On Wed, Jul 5, 2017 at 12:59 PM, Robert Castelo <robert.castelo at upf.edu>
wrote:

> dear developers,
>
> in the framework of a package i maintain, VariantFiltering, i'm using the
> 'FilterRules' class defined in the S4Vector package and i'm interested in
> serializing (e.g., saving to disk via 'saveRDS()') 'FilterRules' objects
> where some rules may defined using functions.
>
> my problem is that the resulting RDS files take much more space than
> expected because apparently the environment of the functions is also
> serialized.
>
> a toy example reproducing the situation could be the following:
>
> library(S4Vectors)
>
> ## define a function that creates a ~7Mb numerical vector
> ## and returns a FilterRules object on a function that has
> ## nothing to do with this vector, except for sharing its
> ## environment. this tries to reproduce the situation in which
> ## a 'FilterRules' object is defined within the package
> ## 'VariantFiltering' where the environment is full of stuff
> ## unrelated to the 'FilterRules' object being created.
>
> f <- function() {
>   z <- rnorm(1000000)
>   g <- function(x) 2*x
>   fr <- FilterRules(list(g=g))
>   fr
> }
>
>
> ## call the previous function to get the FilterRules object
>
> fr <- f()
>
>
> ## while the 'FilterRules' object takes 3.3 Kb ...
>
> print(object.size(fr), units="Kb")
> 3.3 Kb
>
>
> ## ... serializing it takes ~7Mb
>
> print(object.size(serialize(fr, NULL)), units="Mb")
> 7.6 Mb
>
>
> i guess this is the expected behavior behind functions and environments,
> but after reading about this subject (e.g., http://adv-r.had.co.nz/Environ
> ments.html) i still haven't been able to figure out how to serialize the
> 'FilterRules' object without the associated environment or with a minimal
> one without unnecessary objects around.
>
> i'm sure many of you will have an easy workaround for this. any help will
> be highly appreciated.
>
>
> thanks!!
>
> robert.
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list