[Bioc-devel] Best practices to load data for vignette/tests

Tue Jan 22 15:13:23 CET 2019

You could see if there is any existing data already in Bioconductor for use with your package.  That would be preferable.

http://bioconductor.org/packages/release/BiocViews.html#___Software

searching for fastq -  you could see what data ShortRead, seqTools, and FastqCleaner

similarly you could also search for rna-seq packages to see if any of their data is appropriate.

There are also a number of experiment data packages that may provide the data format you are in need of.

http://bioconductor.org/packages/release/BiocViews.html#___ExperimentData

You could search here as well.

Lastly,  Bioconductor has an experimentHub for storing large data files. You can search interactively in R or the web API interface here:

https://experimenthub.bioconductor.org/

If none of those location provide data currently in Bioconductor that is suitable for your package,  You can submit your own data to the ExperimentHub.

http://bioconductor.org/packages/devel/bioc/vignettes/ExperimentHub/inst/doc/CreateAnExperimentHubPackage.html

You could download directly but this could be time consuming depending on internet connections and download speeds.  The Bioconductor hubs provide a caching mechanism so it is only downloaded once and then it remembers where the file is on the system for later use.

Cheers,

Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263

________________________________
From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of Julien Wollbrett <julien.wollbrett using unil.ch>
Sent: Tuesday, January 22, 2019 8:57:23 AM
To: bioc-devel using r-project.org
Subject: [Bioc-devel] Best practices to load data for vignette/tests

Hi everyone,

I am currently working on a R package called BgeeCall allowing to
automatically generate present/absent expression calls from any RNA-Seq
fastq files as long as the species is present in Bgee (https://bgee.org/)
.
The package is almost ready and I am currently writing the vignette and
some tests.

This package can be seen as a workflow taking as input one transcriptome
and at least one fastq file.

My question is how can I import these 2 files to run the vignette/tests?
They are too big to be part of my package.
Can I directly download them from SRA and ensembl (or from my own
server)? Do I need to create a dataset that will be loaded by my package
for this kind of raw and publicly available data?
Do you know if I could reuse some already existing dataset? I am
interested to any best practices infomation.
Thank you for your answers.

Best Regards,

Julien

_______________________________________________
Bioc-devel using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
	[[alternative HTML version deleted]]