[R-pkg-devel] Questions about making a database package (Rpolyhedra)

Mark van der Loo m@rk@v@nderloo @ending from gm@il@com
Fri Jun 29 09:15:11 CEST 2018


Hi Alejandro,

Brooke Anderson gave a nice talk at useR!2017 addressing this exact issue.
See
https://schd.ws/hosted_files/user2017/19/anderson-eddelbuettel-use_r_talk.pdf
for
the slides. The basic idea is to use an external CRAN-like repository for
the data back-end. Brooke used 'drat' to set up such a repo.

-Mark



Op do 28 jun. 2018 om 13:56 schreef alejandro baranek <
alejandrobaranek using gmail.com>:

> Hi Joris:
>
> Thank you for your comments.
> Of course, we are using https for aditional downloads.
>
> For the moment it is not needed to use github LFS, but is an alternative we
> can explore after this short step: our immediate goal is to make the
> package lighter in CRAN. Now it's 35kb so I think we made it well.
>
> We are defining an XSD for exporting polyhedra in XML. After that, it will
> be possible to make an API with the polyhedra database and make the
> improvement you are saying. But with time, we have no funding yet for this
> project and want to implement some functionalities to make it more valuable
> first. But is in our roadmap to make it easy to port it to other languages.
> The interface we are using is really simple, probably it will be the API
> interface too.
>
> Best, Ale.
>
>
> 2018-06-28 5:23 GMT-03:00 Joris Meys <Joris.Meys using ugent.be>:
>
> > Hi Ale,
> >
> > I'd personally use a more specific solution like github LFS (large file
> > storage) for a versioned database. You should also check with CRAN
> itself,
> > as they keep high standards for everything that's not a standard install.
> > More specifically (from CRAN policies) :
> >
> > Downloads of additional software or data as part of package installation
> > or startup should only use secure download mechanisms (e.g., ‘https’ or
> > ‘ftps’).
> >
> > Personally I would store that information in a public database somewhere
> > with a (minimal) API. This can then be extended without inflating the
> > download and would allow people to install only a subset of what they
> need.
> > That would also allow people to also port your work to other language by
> > simply writing a wrapper around the DB API. It's not a necessity, but I
> > thought it was worth mentioning as an option.
> >
> > Cheers
> > Joris
> >
> > On Wed, Jun 27, 2018 at 10:22 PM, alejandro baranek <
> > alejandrobaranek using gmail.com> wrote:
> >
> >> By now, we are on that situation: +- 150 polyhedra published.
> >> But +800 able to publish and because of package size cannot publish all
> of
> >> them.
> >>
> >> It is not a problem on github, it's a problem on CRAN, with building
> >> (fixed
> >> testing timing with simple sample techniques) timing. I would like to
> hear
> >> more from experienced package developers about this issues, but we
> seemed
> >> to found a solution.
> >>
> >> We decided to make another github repo RpolyhedraDB. When you install
> the
> >> package, it downloads the database from the correct tag marked in the
> data
> >> folder of the package in a home directory of the user. So package will
> be
> >> minimal for CRAN, will be RR and will install database on first use (In
> >> case of TRAVIS or other qa/continuous integration, it will install it of
> >> course). It will be possible to setup different DB size using the TAGS,
> in
> >> case we find it preferable to the users.
> >>
> >>
> >> Best, Ale.
> >>
> >>
> >> 2018-03-29 4:43 GMT-03:00 Berry Boessenkool <
> berryboessenkool using hotmail.com
> >> >:
> >>
> >> >
> >> > I assume you cannot simply reduce the 150 to a few for demonstration
> >> > purposes?
> >> >
> >> >
> >> > I have seen people using DRAT packages on github for data, but gh is
> >> > limited in size restrictions as well...
> >> >
> >> >
> >> > No expert in this, but maybe this helps a little bit...
> >> >
> >> > Berry
> >> >
> >> >
> >> >
> >> > -
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > ------------------------------
> >> > *From:* R-package-devel <r-package-devel-bounces using r-project.org> on
> >> behalf
> >> > of alejandro baranek <alejandrobaranek using gmail.com>
> >> > *Sent:* Tuesday, March 27, 2018 19:26
> >> > *To:* r-package-devel using r-project.org
> >> > *Subject:* [R-pkg-devel] Questions about making a database package
> >>
> >> > (Rpolyhedra)
> >> >
> >> > Hello group:
> >> >
> >> > We released Rpolyhedra V0.2 last month. It is able to scrape +800
> >> polyhedra
> >> > definitions from public sources. At V0.2.4 we are publishing only 150
> >> > because the time needed for scrape all the polyhedra, testing and the
> >> > resulting size of the package. The difference is a configuration in
> >> zzz.R,
> >> > very simple to change (Who wants to try it, can build the package for
> >> > themeselves)
> >> > Only the source files of polyhedra definitions are +12MB of size (We
> are
> >> > including it in the data folder for package self suficience).
> >> >
> >> > But we have doubts about good practices for publishing a database
> >> package.
> >> >
> >> > We think the solution is to split the package in an internal
> >> > Rpolyhedra-lib, opensource but not in CRAN, and Rpolyhedra with a
> >> catalog
> >> > sewhich enables to connect with that repo for downloading scraped
> >> polyhedra
> >> > on-demand.
> >> >
> >> > We have to think further the way of connecting both repositories, but
> >> > before touching any code, want to listen to experienced package
> >> developers
> >> > and the community in general, about to do this.
> >> > Do you know any package with analog behavior than this package? We
> >> didn't
> >> > find it.
> >> >
> >> > Best, Ale.
> >> > --
> >> >  alejandro baranek
> >> > @ken4rab <https://twitter.com/ken4rab>
> >> > qbotics <http://qbotics.tumblr.com/> | surferinvaders
> >> > <http://surferinvaders.tumblr.com> | algebraic-soundscapes
> >> > <http://imaginary.org/content/algebraic-soundscapes> | surfer-shuffle
> >> > <http://imaginary.org/program/surfer-shuffle>
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-package-devel using r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >> >
> >>
> >>
> >>
> >> --
> >>  alejandro baranek
> >> @ken4rab <https://twitter.com/ken4rab>
> >> qbotics <http://qbotics.tumblr.com/> | surferinvaders
> >> <http://surferinvaders.tumblr.com> | algebraic-soundscapes
> >> <http://imaginary.org/content/algebraic-soundscapes> | surfer-shuffle
> >> <http://imaginary.org/program/surfer-shuffle>
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-package-devel using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >>
> >
> >
> >
> > --
> > Joris Meys
> > Statistical consultant
> >
> > Department of Data Analysis and Mathematical Modelling
> > Ghent University
> > Coupure Links 653, B-9000 Gent (Belgium)
> >
> > <
> https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g
> >
> >
> > tel: +32 (0)9 264 61 79 <+32%209%20264%2061%2079>
> > -----------
> > Biowiskundedagen 2017-2018
> > http://www.biowiskundedagen.ugent.be/
> >
> > -------------------------------
> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> >
>
>
>
> --
>  alejandro baranek
> @ken4rab <https://twitter.com/ken4rab>
> qbotics <http://qbotics.tumblr.com/> | surferinvaders
> <http://surferinvaders.tumblr.com> | algebraic-soundscapes
> <http://imaginary.org/content/algebraic-soundscapes> | surfer-shuffle
> <http://imaginary.org/program/surfer-shuffle>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list