[Bioc-devel] lazyData (Kasper Daniel Hansen)

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Sat Jul 30 22:58:30 CEST 2016


Fair enough; I'm aware of this.  But loading the annotation package into
the environment of a function means reloading it every time I need it,
which is pretty often.  LazyData is just so much nicer.

Best,
Kasper

On Sat, Jul 30, 2016 at 12:44 PM, Alex Pickering <alexvpickering at gmail.com>
wrote:

> I was experiencing similar issues of long build times with LazyData TRUE so
> I made it FALSE.
>
> You can specify the environment in which data is loaded (it doesn't have to
> be the global environment). For example, to load the cmap_es data from the
> ccdata package call this from within a function:
>
> utils::data("cmap_es", package = "ccdata", envir = environment())
>
> It will get loaded into the function's environment (not global). You may
> also need to add a `cmap_es = NULL` prior to loading it otherwise the build
> process will complain about not declaring the cmap_es variable before using
> it.
>
> On Sat, Jul 30, 2016 at 3:00 AM, <bioc-devel-request at r-project.org> wrote:
>
> > Send Bioc-devel mailing list submissions to
> >         bioc-devel at r-project.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >         https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > or, via email, send a message with subject or body 'help' to
> >         bioc-devel-request at r-project.org
> >
> > You can reach the person managing the list at
> >         bioc-devel-owner at r-project.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Bioc-devel digest..."
> >
> >
> > Today's Topics:
> >
> >    1. Re: lazyData (Kasper Daniel Hansen)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Fri, 29 Jul 2016 16:26:45 -0400
> > From: Kasper Daniel Hansen <kasperdanielhansen at gmail.com>
> > To: Martin Morgan <martin.morgan at roswellpark.org>
> > Cc: "bioc-devel at r-project.org" <bioc-devel at r-project.org>
> > Subject: Re: [Bioc-devel] lazyData
> > Message-ID:
> >         <CAC2h7ut9B9UFfAxrD1OiEXgV-AkWMd=
> > SsiTRCJqWf7zfwBgSZg at mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > With LazyData true you indeed don't load the data until it is available.
> > My guess, from skimming the code extremely fast, is that the extreme
> > requirements (memory and time) during installation is because the data
> > objects needs to get loaded and somehow modified for this to happen.
> >
> > Re. the global environment: if my package has an object TEST, and
> LazyData
> > is TRUE, when I do (say)
> >   data(TEST)
> > or use TEST somehow, TEST doesn't exists in the Global environment.  But
> if
> > LazyData is FALSE and I do data(TEST), TEST gets copied into the Global
> > environment, which is kind of irritating when it is annotation data
> because
> > it seems fragile to me (perhaps it is not).
> >
> > Best,
> > Kasper
> >
> > On Fri, Jul 29, 2016 at 3:38 PM, Martin Morgan <
> > martin.morgan at roswellpark.org> wrote:
> >
> > > On 07/18/2016 10:52 AM, Kasper Daniel Hansen wrote:
> > >
> > >> This is a report on my testing with lazyData turned on and off wrt.
> > >> installation time and memory requirements.  It turns out that using
> > >> lazyData dramatically increases memory consumption and time for a
> > >> (admittedly large) annotation package.  Perhaps this is something we
> > >> should
> > >> think about wrt. annotation and data packages.
> > >>
> > >> Test example is
> > >>   IlluminaHumanMethylationEPICanno.ilm10b2.hg19
> > >> an annotation package for minfi.  The .tar.gz for the this package is
> > 113
> > >> so its not small.
> > >>
> > >> I have explored using
> > >>   LazyData: yes/no in DESCRIPTION
> > >>   adding a single line data/datalist file containing the objects in
> the
> > >> package
> > >>
> > >> What follows are timings and memory consumption of R CMD build +
> INSTALL
> > >> on
> > >> my Mac laptop using an SSD drive.
> > >>
> > >>
> > >>   LazyData: yes
> > >>   datalist: no
> > >>   285 seconds
> > >>   3.22 GB (values as high as 3.8GB seen)
> > >>
> > >>   LazyData: no
> > >>   datalist: no
> > >>   81s
> > >>   1.64 GB
> > >>
> > >>   LazyData: no
> > >>   datalist: yes
> > >>   19s
> > >>   0.38 GB
> > >>
> > >
> > > Hi Kasper -- I have to admit my ignorance on the miracle of lazy data.
> > Can
> > > you clarify what one gains from LazyData? I kind of though that with
> > > LazyData: true the data was only loaded when needed, but that doesn't
> > seem
> > > consistent with the picture you paint above? Also, what's the
> discussion
> > > about global variables?
> > >
> > > Martin
> > >
> > >
> > >> (following combination is not mentioned by R-exts, and while it still
> > uses
> > >> tons of memory, it seems to be 1 minute faster; redid measuring once
> to
> > >> confirm this)
> > >>   LazyData: yes
> > >>   datalist: yes
> > >>   226 s
> > >>   3.26 GB (values as high as 3.9GB seen)
> > >>
> > >> Make the data LazyLoaded is pretty nice; one thing is it avoids
> > polluting
> > >> the global environment.
> > >>
> > >> But it seems that it would be worthwhile to consider if some of this
> > could
> > >> be done prior to the package build time.  Perhaps not, but for sure we
> > are
> > >> spending resources on the building and installing of this by the build
> > >> system.
> > >>
> > >> I started going down this route because my Travis build starting being
> > >> killed due to 3+GB being used. I really don't like turning off
> LazyLoad
> > >> because of the global environment issue, but the number are kind of
> > >> extreme
> > >> here.
> > >>
> > >> Best,
> > >> Kasper
> > >>
> > >>         [[alternative HTML version deleted]]
> > >>
> > >> _______________________________________________
> > >> Bioc-devel at r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >>
> > >>
> > >
> > > This email message may contain legally privileged and/or confidential
> > > information.  If you are not the intended recipient(s), or the employee
> > or
> > > agent responsible for the delivery of this message to the intended
> > > recipient(s), you are hereby notified that any disclosure, copying,
> > > distribution, or use of this email message is prohibited.  If you have
> > > received this message in error, please notify the sender immediately by
> > > e-mail and delete this email message from your computer. Thank you.
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> >
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > Bioc-devel mailing list
> > Bioc-devel at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> > ------------------------------
> >
> > End of Bioc-devel Digest, Vol 148, Issue 38
> > *******************************************
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list