[Rd] Different serialization of functions in interactive vs. batch mode

Gabriel Becker gmbecker at ucdavis.edu
Wed Feb 18 16:03:43 CET 2015


On Wed, Feb 18, 2015 at 6:43 AM, Holger Hoefling <hhoeflin at gmail.com> wrote:

>
> b) and more seriously - as.list strips the environment of the function
> (and thus associated information) as well as information about parent
> environments. For the execution of a function, this would however be
> crucial. This is also why a pure "deparse" alone in my opinion would
> not work.
>

Well, I agree it can be, but it depends heavily on the functions. For
functions which do not refer to objects in their closure (which is most R
functions), this would not be a problem. One can easily write functions
where it would, however.

You can always deparse the function and hash the environment separately.

Also remember that environments are only "sort of" serialized uniquely to
begin with:

> env1 = new.env()
> assign("x", 5, env1)
> env2 = new.env()
> assign("x", 5, env2)
> f = function(x) NULL
> z = f
> environment(f) = env1
> environment(z) = env2
> library(digest)
> digest(f)
[1] "892edaa1aff5cab503a6908617728827"
> digest(z)
[1] "892edaa1aff5cab503a6908617728827"

Only their contents matter, not which environment they actually are

> assign("y", 3, env2)
> digest(z)
[1] "29b29c33c3c50f8bcfe1820621a5cf1f"


This may be what you actually want for your use-case, but it's something to
keep in mind.

~G

>
> Thanks
>
> Holger
>
>
> On Wed, Feb 18, 2015 at 3:36 PM, Gabriel Becker <gmbecker at ucdavis.edu>
> wrote:
> > Holger,
> >
> > For me (see session info) using
> >
> > digest(as.list(f))
> >
> > gets around this problem.
> >
> > ~G
> >
> >> sessionInfo()
> > R version 3.1.0 (2014-04-10)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> >  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > other attached packages:
> > [1] digest_0.6.8
> >
> >
> >
> >
> > On Wed, Feb 18, 2015 at 6:22 AM, Holger Hoefling <hhoeflin at gmail.com>
> wrote:
> >>
> >> Hi Luke,
> >>
> >> Ah - I see - thank you! This at least points me to a way on how to
> >> "fix" this. I tried setting the srcref attribute to NULL, but the hash
> >> value is still different and so is the serialization. So this looks
> >> like it is one difference, but not all of them
> >>
> >> Even if all differences were identified - it still leaves me with
> >> different behavior between interactive and batch-mode, though. I am
> >> curious as to why that is. Do you know why in interactive mode the
> >> attribute with the srcref is set, but not in batch mode?
> >>
> >> Thanks!
> >>
> >> Holger
> >>
> >> P.S. I attached the output I get when i set the attributes to NULL
> >>
> >>
> >> On Wed, Feb 18, 2015 at 3:04 PM,  <luke-tierney at uiowa.edu> wrote:
> >> > Add
> >> >
> >> > attributes(testfun)
> >> >
> >> > and you will see where the two functions differ.
> >> >
> >> > luke
> >> >
> >> >
> >> > On Wed, 18 Feb 2015, Holger Hoefling wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I posted this question to the regular help list, but it seems to be
> >> >> this is probably a question that is better addressed on r-devel.
> Sorry
> >> >> for the double posting.
> >> >>
> >> >> I am using hash-values to cache certain results in R. This caching
> >> >> also depends on the hash-value of the function that is being cached
> >> >> (calculated using the digest package). I noticed that computations
> >> >> that should already be cached are recomputed when switching from an
> >> >> interactive session to a BATCH session. Therefore, I wrote a test
> >> >> script
> >> >>
> >> >> library(digest)
> >> >> testfun <- function() {
> >> >>    return(NULL)
> >> >> }
> >> >> testval <- "testval"
> >> >> print(digest(testfun))
> >> >> print(serialize(testfun, connection = NULL))
> >> >>
> >> >> and executed it once using input-redirection from a file (testFile.R)
> >> >> and once copying the code into an interactive R session. The
> >> >> hash-values of the functions differ. As digest internally relies on
> >> >> serialize, I also checked there and found that digest is not the
> >> >> reason for the discrepancy. Instead, the serialized value of the
> >> >> function already differs between the BATCH and inteactive sessions.
> >> >>
> >> >> I was wondering if someone knows if
> >> >> 1. Is this a feature or a bug? It feels like a bug to me as all the
> >> >> inputs are identical, I would expect that the output is identical as
> >> >> well. Is there something I am overlooking?
> >> >> 2. Is there a way to get consistent hash-values for functions between
> >> >> BATCH and interactive sessions.
> >> >>
> >> >> The output from the BATCH and interactive runs are below (as well as
> >> >> sessionInfo)
> >> >>
> >> >> Thank you very much for your help!
> >> >>
> >> >> Holger Hoefling
> >> >>
> >> >> ---------------------------------
> >> >> BATCH run (via input redirection):
> >> >>
> >> >> $ R --vanilla < testFile.R
> >> >>
> >> >> R version 3.1.0 (2014-04-10) -- "Spring Dance"
> >> >> Copyright (C) 2014 The R Foundation for Statistical Computing
> >> >> Platform: x86_64-unknown-linux-gnu (64-bit)
> >> >>
> >> >> R is free software and comes with ABSOLUTELY NO WARRANTY.
> >> >> You are welcome to redistribute it under certain conditions.
> >> >> Type 'license()' or 'licence()' for distribution details.
> >> >>
> >> >> R is a collaborative project with many contributors.
> >> >> Type 'contributors()' for more information and
> >> >> 'citation()' on how to cite R or R packages in publications.
> >> >>
> >> >> Type 'demo()' for some demos, 'help()' for on-line help, or
> >> >> 'help.start()' for an HTML browser interface to help.
> >> >> Type 'q()' to quit R.
> >> >>
> >> >>> library(digest)
> >> >>> testfun <- function() {
> >> >>
> >> >> +     return(NULL)
> >> >> + }
> >> >>>
> >> >>> print(digest(testfun))
> >> >>
> >> >> [1] "b03160b9250f0d5b5bcce42bd86d8e56"
> >> >>>
> >> >>> print(serialize(testfun, connection = NULL))
> >> >>
> >> >> [1] 58 0a 00 00 00 02 00 03 01 00 00 02 03 00 00 00 04 03 00 00 00 fd
> >> >> 00
> >> >> 00 00
> >> >> [26] fe 00 00 00 06 00 00 00 01 00 04 00 09 00 00 00 01 7b 00 00 00
> 02
> >> >> 00
> >> >> 00 00
> >> >> [51] 06 00 00 00 01 00 04 00 09 00 00 00 06 72 65 74 75 72 6e 00 00
> 00
> >> >> 02
> >> >> 00 00
> >> >> [76] 00 fe 00 00 00 fe 00 00 00 fe
> >> >>>
> >> >>> sessionInfo()
> >> >>
> >> >> R version 3.1.0 (2014-04-10)
> >> >> Platform: x86_64-unknown-linux-gnu (64-bit)
> >> >>
> >> >> locale:
> >> >> [1] C
> >> >>
> >> >> attached base packages:
> >> >> [1] stats     graphics  grDevices utils     datasets  methods   base
> >> >>
> >> >> other attached packages:
> >> >> [1] digest_0.6.4
> >> >>>
> >> >>>
> >> >>
> >> >> ----------------------------------------------
> >> >> Interactive run:
> >> >>
> >> >> $ R --vanilla
> >> >>
> >> >> R version 3.1.0 (2014-04-10) -- "Spring Dance"
> >> >> Copyright (C) 2014 The R Foundation for Statistical Computing
> >> >> Platform: x86_64-unknown-linux-gnu (64-bit)
> >> >>
> >> >> R is free software and comes with ABSOLUTELY NO WARRANTY.
> >> >> You are welcome to redistribute it under certain conditions.
> >> >> Type 'license()' or 'licence()' for distribution details.
> >> >>
> >> >> R is a collaborative project with many contributors.
> >> >> Type 'contributors()' for more information and
> >> >> 'citation()' on how to cite R or R packages in publications.
> >> >>
> >> >> Type 'demo()' for some demos, 'help()' for on-line help, or
> >> >> 'help.start()' for an HTML browser interface to help.
> >> >> Type 'q()' to quit R.
> >> >>
> >> >>> library(digest)
> >> >>> testfun <- function() {
> >> >>
> >> >> +     return(NULL)
> >> >> + }
> >> >>>
> >> >>> print(digest(testfun))
> >> >>
> >> >> [1] "fada482d2894088b079a8e56b7044862"
> >> >>>
> >> >>> print(serialize(testfun, connection = NULL))
> >> >>
> >> >>  [1] 58 0a 00 00 00 02 00 03 01 00 00 02 03 00 00 00 06 03 00 00 04
> 02
> >> >> 00
> >> >> 00 00
> >> >> [26] 01 00 04 00 09 00 00 00 06 73 72 63 72 65 66 00 00 03 0d 00 00
> 00
> >> >> 08
> >> >> 00 00
> >> >> [51] 00 01 00 00 00 0c 00 00 00 03 00 00 00 01 00 00 00 0c 00 00 00
> 01
> >> >> 00
> >> >> 00 00
> >> >> [76] 01 00 00 00 03 00 00 04 02 00 00 00 01 00 04 00 09 00 00 00 07
> 73
> >> >> 72
> >> >> 63 66
> >> >> [101] 69 6c 65 00 00 00 04 00 00 00 00 00 00 00 f2 00 00 04 02 00 00
> 00
> >> >> 01
> >> >> 00 04
> >> >> [126] 00 09 00 00 00 05 6c 69 6e 65 73 00 00 00 10 00 00 00 01 00 04
> 00
> >> >> 09
> >> >> 00 00
> >> >> [151] 00 2b 74 65 73 74 66 75 6e 20 3c 2d 20 66 75 6e 63 74 69 6f 6e
> 28
> >> >> 29
> >> >> 20 7b
> >> >> [176] 0a 20 20 20 20 72 65 74 75 72 6e 28 4e 55 4c 4c 29 0a 7d 0a 00
> 00
> >> >> 04
> >> >> 02 00
> >> >> [201] 00 00 01 00 04 00 09 00 00 00 08 66 69 6c 65 6e 61 6d 65 00 00
> 00
> >> >> 10
> >> >> 00 00
> >> >> [226] 00 01 00 04 00 09 00 00 00 00 00 00 00 fe 00 00 00 fe 00 00 04
> 02
> >> >> 00
> >> >> 00 00
> >> >> [251] 01 00 04 00 09 00 00 00 05 63 6c 61 73 73 00 00 00 10 00 00 00
> 02
> >> >> 00
> >> >> 04 00
> >> >> [276] 09 00 00 00 0b 73 72 63 66 69 6c 65 63 6f 70 79 00 04 00 09 00
> 00
> >> >> 00
> >> >> 07 73
> >> >> [301] 72 63 66 69 6c 65 00 00 00 fe 00 00 04 02 00 00 06 ff 00 00 00
> 10
> >> >> 00
> >> >> 00 00
> >> >> [326] 01 00 04 00 09 00 00 00 06 73 72 63 72 65 66 00 00 00 fe 00 00
> 00
> >> >> fe
> >> >> 00 00
> >> >> [351] 00 fd 00 00 00 fe 00 00 02 06 00 00 04 02 00 00 01 ff 00 00 00
> 13
> >> >> 00
> >> >> 00 00
> >> >> [376] 02 00 00 03 0d 00 00 00 08 00 00 00 01 00 00 00 17 00 00 00 01
> 00
> >> >> 00
> >> >> 00 17
> >> >> [401] 00 00 00 17 00 00 00 17 00 00 00 01 00 00 00 01 00 00 04 02 00
> 00
> >> >> 02
> >> >> ff 00
> >> >> [426] 00 03 ff 00 00 04 02 00 00 06 ff 00 00 00 10 00 00 00 01 00 04
> 00
> >> >> 09
> >> >> 00 00
> >> >> [451] 00 06 73 72 63 72 65 66 00 00 00 fe 00 00 03 0d 00 00 00 08 00
> 00
> >> >> 00
> >> >> 02 00
> >> >> [476] 00 00 05 00 00 00 02 00 00 00 10 00 00 00 05 00 00 00 10 00 00
> 00
> >> >> 02
> >> >> 00 00
> >> >> [501] 00 02 00 00 04 02 00 00 02 ff 00 00 03 ff 00 00 04 02 00 00 06
> ff
> >> >> 00
> >> >> 00 00
> >> >> [526] 10 00 00 00 01 00 04 00 09 00 00 00 06 73 72 63 72 65 66 00 00
> 00
> >> >> fe
> >> >> 00 00
> >> >> [551] 04 02 00 00 02 ff 00 00 03 ff 00 00 04 02 00 00 00 01 00 04 00
> 09
> >> >> 00
> >> >> 00 00
> >> >> [576] 0b 77 68 6f 6c 65 53 72 63 72 65 66 00 00 03 0d 00 00 00 08 00
> 00
> >> >> 00
> >> >> 01 00
> >> >> [601] 00 00 00 00 00 00 03 00 00 00 01 00 00 00 00 00 00 00 01 00 00
> 00
> >> >> 01
> >> >> 00 00
> >> >> [626] 00 03 00 00 04 02 00 00 02 ff 00 00 03 ff 00 00 04 02 00 00 06
> ff
> >> >> 00
> >> >> 00 00
> >> >> [651] 10 00 00 00 01 00 04 00 09 00 00 00 06 73 72 63 72 65 66 00 00
> 00
> >> >> fe
> >> >> 00 00
> >> >> [676] 00 fe 00 00 00 01 00 04 00 09 00 00 00 01 7b 00 00 00 02 00 00
> 00
> >> >> 06
> >> >> 00 00
> >> >> [701] 00 01 00 04 00 09 00 00 00 06 72 65 74 75 72 6e 00 00 00 02 00
> 00
> >> >> 00
> >> >> fe 00
> >> >> [726] 00 00 fe 00 00 00 fe
> >> >>>
> >> >>> sessionInfo()
> >> >>
> >> >> R version 3.1.0 (2014-04-10)
> >> >> Platform: x86_64-unknown-linux-gnu (64-bit)
> >> >>
> >> >> locale:
> >> >> [1] C
> >> >>
> >> >> attached base packages:
> >> >> [1] stats     graphics  grDevices utils     datasets  methods   base
> >> >>
> >> >> other attached packages:
> >> >> [1] digest_0.6.4
> >> >>>
> >> >>>
> >> >>
> >> >> ______________________________________________
> >> >> R-devel at r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >> >>
> >> >
> >> > --
> >> > Luke Tierney
> >> > Ralph E. Wareham Professor of Mathematical Sciences
> >> > University of Iowa                  Phone:             319-335-3386
> >> > Department of Statistics and        Fax:               319-335-3017
> >> >    Actuarial Science
> >> > 241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
> >> > Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> >
> >
> > --
> > Gabriel Becker, PhD
> > Computational Biologist
> > Bioinformatics and Computational Biology
> > Genentech, Inc.
>



-- 
Gabriel Becker, PhD
Computational Biologist
Bioinformatics and Computational Biology
Genentech, Inc.

	[[alternative HTML version deleted]]



More information about the R-devel mailing list