[Bioc-devel] Handling larger data in vignette

Martin Morgan m@rtin@morg@n @ending from ro@wellp@rk@org
Fri Jul 27 11:30:14 CEST 2018


Remember also that there are overall evaluation time limits.

One strategy might use existing stable publicly available SC data sets 
from e.g.,

   http://imlspenticton.uzh.ch:3838/conquer/
   https://hemberg-lab.github.io/scRNA.seq.datasets/

These could be downloaded using BiocFileCache as a first step in the 
vignette ; the download cost would be 'paid' the first time, but 
subsequent use would be from the locally cached data.

A second approach would be to create an ExperimentHub package, and to 
use that in your examples.

   http://bioconductor.org/packages/devel/ExperimentHub
 
http://bioconductor.org/packages/devel/bioc/vignettes/ExperimentHub/inst/doc/CreateAnExperimentHubPackage.html

The submission process would start by submitting the EH package, and 
then adding, once the kinks in the experiment data package are worked 
out, the software package to the issue. The data and software packages 
would be accepted together.

Martin

On 07/27/2018 02:38 AM, Sarah Williams via Bioc-devel wrote:
> Hi,
> 
> I'm preparing a package for submission to bioconductor, but hitting the 4mb
> size limit due to examples in my vignette.
> 
> I do have a demo toy sized dataset which I use for the bulk of the
> vignette. But I wanted to show real-data examples at the end because
> approach doesn't work well on toy-sized data.
> 
> Conceivably everything except for the 'examples' section would make a
> 'complete' vignette (but probably not a very helpful one...). I'm wondering
> if I should static-ify just those examples? Might hit the 50% runnable
> code-chunk limit then though. Unfortunately its a rank-based approach so i
> can't really take the top 100 genes for these particuar objects.
> 
> Not sure how best to solve this, any tips/suggestions? Thanks!
> 
> (The problematic vignette is here:
> https://bioinformatics.erc.monash.edu/home/sarah.williams/projects/cell_groupings/doco/celaref_doco.html
> )
> 
> NB: To make matters worse this is a tool for comparing datasets - so I have
> multiples! They are public datasets and I haven't done anything exciting
> with them (nor would anyone else want to reuse processed objects) - so I
> don't think that I should make a data package.
> 
> NB: Using xz compression and some cleanup I got it down to 21mb, from 49mb
> - So its not huge but I don't think I can get to <4mb this way.
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 


This email message may contain legally privileged and/or...{{dropped:2}}



More information about the Bioc-devel mailing list