[Bioc-devel] Large vignettes

Lambda Moses d|u2 @end|ng |rom c@|tech@edu
Thu Jan 12 21:47:01 CET 2023


Vincent made some good points. I have experiences writing this kind of 
books, and I can testify. I have written the Museum of Spatial 
Transcriptomics book, and the Voyager vignettes probably will 
collectively become book sized as we are adding more vignettes about 
different technologies. It takes a while to build those on GitHub 
Actions, I think 40 minutes for the Museum book and over an hour for 
Voyager, after installing dependencies, much longer than R CMD check 
(which takes 10 minutes or so), so having more books hosted on 
Bioconductor will be more demanding to the build system.

Regarding Vincent's second point, this is why I use GitHub Actions to 
automatically build the Museum book the development version of the 
Voyager website on a schedule; it will fail if some packages have 
breaking changes or get kicked out of CRAN. Again, if Bioconductor hosts 
those books and if the books proliferate, it would be more demanding to 
the build system, with the repeated scheduled builds. Also, because 
GitHub Actions has limited computational resources, it's a sanity check 
that the book can run on smaller machines, which means more people can 
run it, even with large datasets.

I agree with the third point. At present I don't know a flagging 
mechanism. As the Museum book is rebuilt with new versions of the 
spatial transcriptomics literature database, the plots change and 
sometimes no longer match the text descriptions and I'm really behind in 
updating the text. It can be a lot of work. In general I wish that we 
are better awarded for writing and maintaining documentation and 
maintaining existing software.

There are more issues with Bioconductor books than the build system. I 
talked to the authors of the upcoming Orchestrating Spatial 
Transcriptomics Analysis (OSTA) book about contributing a chapter. They 
suggested that I submit a workflow package instead, because they want to 
keep OSTA to tried and true "best practices" and be as neutral as 
possible. There are also the csaw and SingleR books; those are also 
fairly classic and popular packages. It gave me an impression that at 
present the bar is really high to get Bioconductor to host your book and 
so far it's restricted to core members and core-ish packages. Maybe it's 
related to challenges with the build system to keep the number of books 
small. Maybe it's how the Bioconductor book concept is different from 
the workflow concept at present. Submitting a workflow package is 
another way to get Bioconductor to host something like a long vignette.

On 1/12/23 3:42 AM, Vincent Carey wrote:
>
>
> On Thu, Jan 12, 2023 at 5:29 AM Lluís Revilla 
> <lluis.revilla using gmail.com> wrote:
>
>     Hi all,
>
>     Perhaps instead of long vignettes, it would be better to use a
>     book hosted and in sync with the packages at Bioconductor.
>     There are already a few: https://www.bioconductor.org/books/release/
>     But I was not able to find how to submit such bookdowns to
>     Bioconductor (I briefly searched the website and the dev book at
>     https://contributions.bioconductor.org/docs.html?q=book).
>     I think the limits are less restrictive and there is no minimum
>     size of chapters or documentation, but I am not sure.
>
>
> I like this idea -- but I want to say a few things, really just 
> personal observations, nothing "official", and there
> may need to be corrections to some of these remarks.
>
> First, the concept that the bioconductor build system could handle 
> monograph-size artifacts was improvised as the
> OSCA book came into being.  There is interesting and intricate 
> infrastructure there supporting cross-referenced
> computations to reduce redundant computation, but I don't think that 
> has become an "authoring standard" and
> it requires some specialized knowledge to use.  Upshot -- we have some 
> attractive examples of monograph/book
> artifacts but we don't really have a standard approach to guide 
> authors to efficiently deployable products, and the
> tooling to build and check the monographs regularly is somewhat limited.
>
> Second, a book becomes a sub-ecosystem, necessarily of both CRAN and 
> Bioconductor.  We want the
> book to remain valid and computable at all times, certainly in the 
> release branch, but as packages on
> which the book depends change and perhaps disappear (happens primarily 
> with CRAN) the book production
> can fail.  Authors have to be vigilant and responsive to events of 
> this sort.
>
> Third, the narrative of a book is synchronized with the computations 
> when it is authored, but underlying
> software evolution can make prose statements in the book become false 
> over time.  We saw this with text describing
> cluster identities in single-cell analysis ... when a certain 
> projection function in an upstream package was
> modified, cluster labeling silently changed and the text became 
> false.  We want some kind of flagging
> procedure that will alert us to changes of this sort.
>
> There are technical responses to all of these observations but 
> implementing well-engineered solutions will
> require more resources than we currently have.  The workshop authoring 
> method used in
> https://github.com/seandavi/BuildABiocWorkshop is surely relevant; 
> Alex Mahmoud has a work
> in progress called BiocDeployables that is also relevant.  Ultimately 
> we want to improve communication
> of good analytic methods to the scientific community, and 
> monograph-scale resources are definitely
> useful, but smaller-scale resources that don't require the technology 
> of package production can also
> be valuable, and BiocDeployables goes in that direction. Maintenance 
> and the avoidance of bit/doc rot
> are first-class concerns and really require author commitment.
>
>
>
>     Some authors already have books outside bioconductor to have
>     extensive examples of their packages.
>     They will also benefit from having them with the Bioconductor
>     framework and in sync with the packages released to the users.
>
>     Best,
>
>     Lluís
>
>
>     On Wed, 4 Jan 2023 at 21:39, Vincent Carey
>     <stvjc using channing.harvard.edu> wrote:
>
>         I am glad you brought this up here, and I welcome further
>         discussion on
>         this mailing list.  It is important to understand the
>         constraints on
>         development
>         that arise from Bioconductor's package guidelines.
>
>         I don't think we want to change the limits on package payload
>         size without
>         understanding the consequences for users and our build
>         system.  The split
>         approach mentioned by Lambda seems sensible to me, and I hope
>         it is
>         not too burdensome.  Additional commentary and details from
>         the community
>         are welcome.
>
>         On Wed, Jan 4, 2023 at 3:21 PM Lambda Moses <dlu2 using caltech.edu>
>         wrote:
>
>         > Hi Adam,
>         >
>         > I also got this problem, and I would like some input from
>         Bioc Core
>         > Team. I worked around it by writing a minimal vignette in
>         the main
>         > branch. Then I made a documentation branch, where I have the
>         same code
>         > as in main branch, but with more elaborate vignettes used to
>         build a
>         > pkgdown website. I made a rule for myself that I can only
>         merge from the
>         > main or devel branch to the documentation branch but not the
>         other way
>         > round. I would switch branch when I find a bug or want a new
>         feature
>         > while writing the vignettes. You can see the main branch here:
>         > https://github.com/pachterlab/voyager/tree/main The
>         documentation branch
>         > here: https://github.com/pachterlab/voyager/tree/documentation
>         >
>         > I kind of wonder if the 5 MB rule is outdated in the age of
>         increasing
>         > computer power and internet speed. A jpeg photo can easily
>         exceed 5 MB.
>         > I also wonder if this rule is deliberately kept for good
>         reasons, like
>         > to make R more inclusive to disadvantaged people with
>         limited internet
>         > services.
>         >
>         > Regards,
>         >
>         > Lambda
>         >
>         > _______________________________________________
>         > Bioc-devel using r-project.org mailing list
>         > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>         >
>
>         -- 
>         The information in this e-mail is intended only for the
>         ...{{dropped:18}}
>
>         _______________________________________________
>         Bioc-devel using r-project.org mailing list
>         https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> The information in this e-mail is intended only for th...{{dropped:15}}



More information about the Bioc-devel mailing list