[Bioc-devel] Bioconductor data packages containing very large files
Heery Richard
R|ch@rd@Heery @end|ng |rom |eo@|t
Wed Jan 4 22:19:20 CET 2023
Thank you Lori.
I was wondering if there is someone I could ask if what I am working on could be of interest and suitable for Bioconductor before investing more time developing it and uploading the data? What I have is a HDF5 file constructed using the Bioconductor methrix package for all of the methylation data for TCGA downloaded from GDC. It allows rapid querying of methylation data for regions of interest (e.g. enhancers or promoters) provided as a GRanges object across the ~10,000 samples in TCGA, completing a pan-cancer analysis in minutes . Otherwise, downloading and querying data for regions of interest in several cancer types could take several days or longer.
There are already packages for downloading TCGA data on Bioconductor, but what I think is novel here is the speed and ease with which methylation data can be retrieved for a large number of samples.
Please let me know if this is something that could be useful for Bioconductor.
Best wishes,
Richard
________________________________
From: Kern, Lori <Lori.Shepherd using RoswellPark.org>
Sent: Wednesday, January 4, 2023 3:22 PM
To: Heery Richard <Richard.Heery using ieo.it>; bioc-devel using r-project.org <bioc-devel using r-project.org>
Subject: Re: Bioconductor data packages containing very large files
A package cannot be that large directly. Please see information on creating an ExperimentHub package to provide the data for use in the package:
https://bioconductor.org/packages/release/bioc/vignettes/HubPub/inst/doc/CreateAHubPackage.html<https://urlsand.esvalabs.com/?u=https%3A%2F%2Fbioconductor.org%2Fpackages%2Frelease%2Fbioc%2Fvignettes%2FHubPub%2Finst%2Fdoc%2FCreateAHubPackage.html&e=8c00e339&h=ddd80e7a&f=y&p=n>
Cheers,
Lori Shepherd - Kern
Bioconductor Core Team
Roswell Park Comprehensive Cancer Center
Department of Biostatistics & Bioinformatics
Elm & Carlton Streets
Buffalo, New York 14263
________________________________
From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of Heery Richard <Richard.Heery using ieo.it>
Sent: Wednesday, January 4, 2023 9:07 AM
To: bioc-devel using r-project.org <bioc-devel using r-project.org>
Subject: [Bioc-devel] Bioconductor data packages containing very large files
Hi Bioconductor,
I have made a database that is 27 GB that I would like to share as part of a Bioconductor package and I was just wondering if it is possible to submit very large files like this to Bioconductor or if there may be any alternative ways of sharing the file as part of a package?
Best wishes,
Richard Heery
IEO, Milan, Italy
[5x1000]<http://secure-web.cisco.com/1AxunlkMfMiOLWUzSr4U79PCQR-m5gepMk3mVQ7uLXKMCAeTNzkyygmALFERcsVxhVSX7zkAFcllkjIZnZKxUh80DdGusfFIL4XGxaAJXK-2sG43sKOXakdJVd8cDp7HQArb01uoPUuJlVHlAaVSxuLW-ZABWxKwog7MokYLudEkL7-ib-hPb7R2WajUM6LmVWyXT51DcWzhFVIHJ4LNCbfelON_k_SA2ybm5NUGX7cKNLFpUIW2cmp2rhue-arnJ30_cFqdxWgDzajh8Nt87OFWo51fE4_OyTrtBO-CG555adYVMuYutSliHgvl1_BGs/http%3A%2F%2Fwww.ieo.it%2Fit%2FSCIENCE-IN-SOCIETY%2FLe-nostre-iniziative%2F5-per-mille%2F><https://urlsand.esvalabs.com/?u=http%3A%2F%2Fsecure-web.cisco.com%2F1AxunlkMfMiOLWUzSr4U79PCQR-m5gepMk3mVQ7uLXKMCAeTNzkyygmALFERcsVxhVSX7zkAFcllkjIZnZKxUh80DdGusfFIL4XGxaAJXK-2sG43sKOXakdJVd8cDp7HQArb01uoPUuJlVHlAaVSxuLW-ZABWxKwog7MokYLudEkL7-ib-hPb7R2WajUM6LmVWyXT51DcWzhFVIHJ4LNCbfelON_k_SA2ybm5NUGX7cKNLFpUIW2cmp2rhue-arnJ30_cFqdxWgDzajh8Nt87OFWo51fE4_OyTrtBO-CG555adYVMuYutSliHgvl1_BGs%2Fhttp%253A%252F%252Fwww.ieo.it%252Fit%252FSCIENCE-IN-SOCIETY%252FLe-nostre-iniziative%252F5-per-mille%252F%26gt%3B&e=8c00e339&h=13a49eac&f=y&p=n>
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel using r-project.org mailing list
https://secure-web.cisco.com/139fG07AH98RfoK_SoTpu-tcev3I6LbfqnNToVDRGIvCJsmi1AvcbL1c_t7Dd-rZAXrbqZqUyjb-Sim4Tlgxui3zBHM-ntzSY3xE-0nyd4prF3cuito1iGsDjrgMaAqQ35mIgJeRu2NgXkmQYh5E_wUGoyaoiTz5wLOF2f_rz5wXX3QfIIeUKae7OPTyPuN7OoBJ_gqHxxZ0pK0K6ZyHmOqaF5vc7CmBgK26UgmtjXgat8_vjnbfbDbp_rO_0k1IdDLJjIkyBoSdmFO6wmG6H4Y4r1CzG9PyRLXXG8CMX4PHGc4DMXCTsBYxv6T3GOTpW/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel<https://urlsand.esvalabs.com/?u=https%3A%2F%2Fsecure-web.cisco.com%2F139fG07AH98RfoK_SoTpu-tcev3I6LbfqnNToVDRGIvCJsmi1AvcbL1c_t7Dd-rZAXrbqZqUyjb-Sim4Tlgxui3zBHM-ntzSY3xE-0nyd4prF3cuito1iGsDjrgMaAqQ35mIgJeRu2NgXkmQYh5E_wUGoyaoiTz5wLOF2f_rz5wXX3QfIIeUKae7OPTyPuN7OoBJ_gqHxxZ0pK0K6ZyHmOqaF5vc7CmBgK26UgmtjXgat8_vjnbfbDbp_rO_0k1IdDLJjIkyBoSdmFO6wmG6H4Y4r1CzG9PyRLXXG8CMX4PHGc4DMXCTsBYxv6T3GOTpW%2Fhttps%253A%252F%252Fstat.ethz.ch%252Fmailman%252Flistinfo%252Fbioc-devel&e=8c00e339&h=11e86087&f=y&p=n>
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list