[Bioc-devel] splitting simpleSingleCell into self-contained vignettes

Aaron Lun alun at wehi.edu.au
Tue Dec 12 21:49:15 CET 2017

Thanks Andrzej.

> Thank you. I've edited the workflow index page by introducing a separate
> "Single-cell Workflows" section, and by substituting the previous link to
> your workflow by links to the individual parts.

Great, I'm looking forward to seeing it. Do you know how frequently the
index page (I assume we're talking about
https://bioconductor.org/help/workflows/) updates? I assume your edits
haven't propagated through the system yet.

> As discussed during EuroBioc, I'm happy to restructure the index page by
> grouping workflows by topic. It would be really helpful if authors would
> chime in to suggest the most relevant sections for their workflows.

I can chip in with two that I'm involved in:

"Differential Binding from ChIP-seq data
<https://bioconductor.org/help/workflows/chipseqDB/>" => ChIP-seq workflows
"Gene-level RNA-seq differential expression and pathway analysis
<https://bioconductor.org/help/workflows/RnaSeqGeneEdgeRQL/>" => RNA-seq

Of course, it depends on how granular you want the topics to be. For
example, I only see one ChIP-seq workflow, so that particular section
might be a bit lonely for a while (I am planning to split that into two
workflows later).



> On Tue, Dec 12, 2017 at 7:19 PM, Aaron Lun <Aaron.Lun at cruk.cam.ac.uk> wrote:
>> The split-up workflows seem to have built successfully:
>> http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/
>> Is there something I have to do to get a blurb specific to each
>> vignette, as observed for "Annotation_Resources" vs
>> "Annotating_Genomic_Ranges"?
>> The various vignettes are ordered pedagogically, so the order in which
>> they are presented in the workflow page might require some manual
>> specification. It would also be nice if the multiple simpleSingleCell
>> workflows are grouped together, to avoid being intermingled with other
>> workflows on the page.
>> Finally, could we get a separate "single-cell workflows" section? The
>> current "Basic/Advanced" partition is pretty crude, and I can see
>> opportunities for more detailed stratification, e.g., by ChIP-seq,
>> RNA-seq, single-cell RNA-seq, proteomics (including mass cytometry).
>> Cheers,
>> Aaron
>> On 11/12/17 20:24, Aaron Lun wrote:
>>> Thanks Val:
>>> Obenchain, Valerie wrote:
>>>> Hi,
>>>> On 12/11/2017 08:49 AM, Aaron Lun wrote:
>>>>> Following up on our earlier discussion:
>>>>> https://stat.ethz.ch/pipermail/bioc-devel/2017-October/011949.html
>>>>> I have split the simpleSingleCell workflow into three (four, if you
>>>>> include the introductory overview) self-contained Rmarkdown files. I am
>>>>> preparing them for submission to BioC's workflow builder, and I would
>>>>> like to check what is the best way to do this:
>>>>> i) Each workflow file goes into its own package.
>>>>> ii) All workflow files go into a single package.
>>>>> Option (i) is logistically easier but probably a bit odd conceptually,
>>>>> especially if users need to download "simpleSingleCell1",
>>>>> "simpleSingleCell2", "simpleSingleCell3", etc.
>>>>> Option (ii) is nicer but requires more coordination, as the BioC
>> webpage
>>>>> builder needs to know that that multiple HTMLs have been generated.
>> It's
>>>>> also unclear to me whether this will run into problems with the DLL
>>>>> limit - does R restart when compiling each vignette?
>>>> You could do either but I'd say option 2 is easier from a maintenance
>>>> standpoint and probably for the user. Maybe you've seen this but an
>>>> example is the annotation workflow package which houses 2 workflows:
>>>> ~/repos/svn/workflows >ls annotation/vignettes/
>>>> Annotating_Genomic_Ranges.Rmd  Annotation_Resources.Rmd
>>>> databaseTypes.png  display.png
>>>> Each has an informative name and is presented on the website as an
>>>> individual workflow:
>>>> https://bioconductor.org/help/workflows/
>>> I didn't know that, thanks.
>>>> I don't think more coordination is involved - you just have multiple
>>>> files in vignettes/. And, as you mentioned, it's a bonus that when a
>>>> user downloads the annotation package they get all related workflows.
>>>> A fresh R session is started for each package but not for each
>>>> vignette in the package.
>>> Ah. That's a shame, I was hoping to reduce the sensitivity to the DLL
>> limit.
>>> But now that I think about it: maybe that's not actually a problem,
>>> provided the BioC workflow builders have a high DLL limit. The main
>>> issue was that *users* were running into the DLL limit; by splitting the
>>> workflow up, users should no be tempted to run everything at once, thus
>>> avoiding the limit on their machines. Of course, Bioconductor can
>>> control its own build machines, so as long as they set the MAX_DLLs
>>> high, it should still build and show up on the website.
>>>>> Any thoughts would be appreciated. I'm also happy to be a guinea pig
>> for
>>>>> any SVN->Git transition for the workflow packages, if that's on the
>> radar.
>>>> Nitesh has created git repos for the workflow packages and Andrzej is
>>>> adapting the BBS code to incorporate them into the builds. We
>>>> guesstimate this will be done by the end of the year. You shouldn't
>>>> have to do anything on your end - once we're ready to switch over
>>>> we'll let you know and send the new location of the workflow in git.
>>> Cool, looking forward to it.
>>> -A
>>>> Val
>>>>> Cheers,
>>>>> Aaron
