[Bioc-devel] Handling larger data in vignette

Sarah Williams @@r@h@willi@m@1 @ending from mon@@h@edu
Mon Jul 30 02:16:33 CEST 2018

Thanks Martin,
I'll give the ExperimentHub package method a try then, if I can use it for
vignette-only processed data. I think I was getting confused with
annotationHub (which I gather is intended for
useful-for-other-packages-or-applications raw data).
Time should be ok - I just want to skip the long step on full-sized data.

On 27 July 2018 at 19:30, Martin Morgan <martin.morgan using roswellpark.org>

> Remember also that there are overall evaluation time limits.
> One strategy might use existing stable publicly available SC data sets
> from e.g.,
>   http://imlspenticton.uzh.ch:3838/conquer/
>   https://hemberg-lab.github.io/scRNA.seq.datasets/
> These could be downloaded using BiocFileCache as a first step in the
> vignette ; the download cost would be 'paid' the first time, but subsequent
> use would be from the locally cached data.
> A second approach would be to create an ExperimentHub package, and to use
> that in your examples.
>   http://bioconductor.org/packages/devel/ExperimentHub
> http://bioconductor.org/packages/devel/bioc/vignettes/Experi
> mentHub/inst/doc/CreateAnExperimentHubPackage.html
> The submission process would start by submitting the EH package, and then
> adding, once the kinks in the experiment data package are worked out, the
> software package to the issue. The data and software packages would be
> accepted together.
> Martin
> On 07/27/2018 02:38 AM, Sarah Williams via Bioc-devel wrote:
>> Hi,
>> I'm preparing a package for submission to bioconductor, but hitting the
>> 4mb
>> size limit due to examples in my vignette.
>> I do have a demo toy sized dataset which I use for the bulk of the
>> vignette. But I wanted to show real-data examples at the end because
>> approach doesn't work well on toy-sized data.
>> Conceivably everything except for the 'examples' section would make a
>> 'complete' vignette (but probably not a very helpful one...). I'm
>> wondering
>> if I should static-ify just those examples? Might hit the 50% runnable
>> code-chunk limit then though. Unfortunately its a rank-based approach so i
>> can't really take the top 100 genes for these particuar objects.
>> Not sure how best to solve this, any tips/suggestions? Thanks!
>> (The problematic vignette is here:
>> https://bioinformatics.erc.monash.edu/home/sarah.williams/pr
>> ojects/cell_groupings/doco/celaref_doco.html
>> )
>> NB: To make matters worse this is a tool for comparing datasets - so I
>> have
>> multiples! They are public datasets and I haven't done anything exciting
>> with them (nor would anyone else want to reuse processed objects) - so I
>> don't think that I should make a data package.
>> NB: Using xz compression and some cleanup I got it down to 21mb, from 49mb
>> - So its not huge but I don't think I can get to <4mb this way.
>>         [[alternative HTML version deleted]]
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.

	[[alternative HTML version deleted]]

More information about the Bioc-devel mailing list