[Bioc-devel] Issues with package size

Zuguang Gu jokergoo @end|ng |rom gm@||@com
Fri Sep 24 13:35:22 CEST 2021


Hi Hervé, Hi all,

Yes, I totally understand and agree with the standard of Bioc packages
development. I always try to follow it as best as I can. But regarding the
"real rerunnable vignettes", I think there are several scenarios that make
it really difficult to follow this standard way:

1. Some examples need a long time to run or depend on a very large dataset,
and it is impossible to use a reduced small subset of data. E.g. in my
HilbertCurve package, there are several examples visualizing the complete
chromosome 21 (around 15 examples, which take almost 30-45 min to run).
Because this package makes a "global view of a genome", it makes no sense
to only work on a small subset of genomic regions. My solution is on my
local machine, the code chunks that generate these plots are actually
evaluated and they also generate cached figures, while on the Bioc server,
the code chunks are not evaluated and the cached figures are directly used.
I guess packages for single-cell RNAseq analysis might also have this issue
that the analysis makes sense only with more than thousands of cells (just
guess, I don't have experience with scRNAseq data analysis).

2. Some vignettes generate or include many figures (static or gif) which
results in the final file size of the package being very huge (tens of MBs
or maybe more), especially for packages focused on data visualization. A
good vignette should contain lots of example figures which illustrate the
various usages of the package for users. E.g. my ComplexHeatmap package
contains hundreds of figures, thus I decided to host the vignette somewhere
else.

I can think of some solutions for this:

1. include a small and evaluable vignette that only contains the "core
analysis" or "the most used features" in the package, while hosting more
comprehensive vignettes somewhere else.

2. add extensive tests in the package to ensure the reliability of the
package. For example, in ComplexHeatmap, although there is almost no
runnable vignette, it actually includes hundreds of tests that will be
evaluated during `R CMD check`.

Best regards,
Zuguang


On Thu, 23 Sept 2021 at 20:49, Hervé Pagès <hpages.on.github using gmail.com>
wrote:

> Hi Zuguang,
>
> On 23/09/2021 05:45, Zuguang Gu wrote:
> > Hi Giulia,
> >
> > I think it is ok to host the vignettes somewhere else. I have two
> packages
> > of which the vignettes are hosted on GitHub Page.
>
> Unfortunately this is something we **strongly** discourage.
>
>
>
> It's important to understand that your vignettes can break any time
> (e.g. when something they depend on changes), not just when you update
> your package. This is why Bioconductor vignettes should always be
> located in the vignettes/ folder of the package, and be "real"
> vignettes, that is, they must contain code chunks that get evaluated by
> 'R CMD check'.
>
>
> Best,
>
> H.
>
>
> >
> > http://www.bioconductor.org/packages/devel/bioc/html/ComplexHeatmap.html
> > https://www.bioconductor.org/packages/devel/bioc/html/cola.html
> >
> > But since now the vignettes are not automatically checked, you need to
> make
> > sure every time you update your package, the vignettes can be
> successfully
> > generated.
> >
> > Cheers,
> > Zuguang
> >
> >
> > On Thu, 23 Sept 2021 at 14:07, Giulia Pais <giuliapais1 using gmail.com>
> wrote:
> >
> >> Hello, I’m the developer of the package ISAnalytics.
> >> I’d like to ask if it is possible to have a file/vignette that links to
> >> other documentation outside the package (like a GitHub wiki) since we
> have
> >> issues with the maximum allowed package size due to vignette size, We
> would
> >> like to maintain as much documentation as possible and we already tried
> to
> >> reduce data included but it’s not sufficient.
> >> Thanks in advance
> >> Giulia Pais
> >>
> >>
> >>          [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioc-devel using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
> --
> Hervé Pagès
>
> Bioconductor Core Team
> hpages.on.github using gmail.com
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list