[Bioc-devel] Bioconductor data packages containing very large files

Vincent Carey @tvjc @end|ng |rom ch@nn|ng@h@rv@rd@edu
Thu Jan 5 13:14:37 CET 2023


I would like us to discuss this in the context of the HDF Scalable Data
Service that we have
running on the NSF cloud in jetstream2.  Let's discuss off line and then
report back to the list.


On Thu, Jan 5, 2023 at 7:11 AM Heery Richard <Richard.Heery using ieo.it> wrote:

> Thank you Lori.
>
> I was wondering if there is someone I could  ask if what I am working on
> could be of interest and suitable for Bioconductor before investing more
> time developing it and uploading the data? What I have is a HDF5 file
> constructed using the Bioconductor methrix package for all of the
> methylation data for TCGA downloaded from GDC. It allows rapid querying of
> methylation data for regions of interest (e.g. enhancers or promoters)
> provided as a GRanges object across the ~10,000 samples in TCGA, completing
> a pan-cancer analysis in minutes . Otherwise, downloading and querying data
> for regions of interest in several cancer types could take several days or
> longer.
>
> There are already packages for downloading TCGA data on Bioconductor, but
> what I think is novel here is the speed and ease with which methylation
> data can be retrieved for a large number of samples.
>
> Please let me know if this is something that could be useful for
> Bioconductor.
>
> Best wishes,
>
> Richard
> ________________________________
> From: Kern, Lori <Lori.Shepherd using RoswellPark.org>
> Sent: Wednesday, January 4, 2023 3:22 PM
> To: Heery Richard <Richard.Heery using ieo.it>; bioc-devel using r-project.org <
> bioc-devel using r-project.org>
> Subject: Re: Bioconductor data packages containing very large files
>
> A package cannot be that large directly.  Please see information on
> creating an ExperimentHub package to provide the data for use in the
> package:
>
>
> https://bioconductor.org/packages/release/bioc/vignettes/HubPub/inst/doc/CreateAHubPackage.html
> <
> https://urlsand.esvalabs.com/?u=https%3A%2F%2Fbioconductor.org%2Fpackages%2Frelease%2Fbioc%2Fvignettes%2FHubPub%2Finst%2Fdoc%2FCreateAHubPackage.html&e=8c00e339&h=ddd80e7a&f=y&p=n
> >
>
> Cheers,
>
>
>
>
> Lori Shepherd - Kern
>
> Bioconductor Core Team
>
> Roswell Park Comprehensive Cancer Center
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> ________________________________
> From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of Heery
> Richard <Richard.Heery using ieo.it>
> Sent: Wednesday, January 4, 2023 9:07 AM
> To: bioc-devel using r-project.org <bioc-devel using r-project.org>
> Subject: [Bioc-devel] Bioconductor data packages containing very large
> files
>
> Hi Bioconductor,
>
> I have made a database that is 27 GB that I would like to share as part of
> a Bioconductor package and I was just wondering if it is possible to submit
> very large files like this to Bioconductor or if there may be any
> alternative ways of sharing the file as part of a package?
>
> Best wishes,
>
> Richard Heery
>
> IEO, Milan, Italy
> [5x1000]<
> http://secure-web.cisco.com/1AxunlkMfMiOLWUzSr4U79PCQR-m5gepMk3mVQ7uLXKMCAeTNzkyygmALFERcsVxhVSX7zkAFcllkjIZnZKxUh80DdGusfFIL4XGxaAJXK-2sG43sKOXakdJVd8cDp7HQArb01uoPUuJlVHlAaVSxuLW-ZABWxKwog7MokYLudEkL7-ib-hPb7R2WajUM6LmVWyXT51DcWzhFVIHJ4LNCbfelON_k_SA2ybm5NUGX7cKNLFpUIW2cmp2rhue-arnJ30_cFqdxWgDzajh8Nt87OFWo51fE4_OyTrtBO-CG555adYVMuYutSliHgvl1_BGs/http%3A%2F%2Fwww.ieo.it%2Fit%2FSCIENCE-IN-SOCIETY%2FLe-nostre-iniziative%2F5-per-mille%2F
> ><
> https://urlsand.esvalabs.com/?u=http%3A%2F%2Fsecure-web.cisco.com%2F1AxunlkMfMiOLWUzSr4U79PCQR-m5gepMk3mVQ7uLXKMCAeTNzkyygmALFERcsVxhVSX7zkAFcllkjIZnZKxUh80DdGusfFIL4XGxaAJXK-2sG43sKOXakdJVd8cDp7HQArb01uoPUuJlVHlAaVSxuLW-ZABWxKwog7MokYLudEkL7-ib-hPb7R2WajUM6LmVWyXT51DcWzhFVIHJ4LNCbfelON_k_SA2ybm5NUGX7cKNLFpUIW2cmp2rhue-arnJ30_cFqdxWgDzajh8Nt87OFWo51fE4_OyTrtBO-CG555adYVMuYutSliHgvl1_BGs%2Fhttp%253A%252F%252Fwww.ieo.it%252Fit%252FSCIENCE-IN-SOCIETY%252FLe-nostre-iniziative%252F5-per-mille%252F%26gt%3B&e=8c00e339&h=13a49eac&f=y&p=n
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
>
> https://secure-web.cisco.com/139fG07AH98RfoK_SoTpu-tcev3I6LbfqnNToVDRGIvCJsmi1AvcbL1c_t7Dd-rZAXrbqZqUyjb-Sim4Tlgxui3zBHM-ntzSY3xE-0nyd4prF3cuito1iGsDjrgMaAqQ35mIgJeRu2NgXkmQYh5E_wUGoyaoiTz5wLOF2f_rz5wXX3QfIIeUKae7OPTyPuN7OoBJ_gqHxxZ0pK0K6ZyHmOqaF5vc7CmBgK26UgmtjXgat8_vjnbfbDbp_rO_0k1IdDLJjIkyBoSdmFO6wmG6H4Y4r1CzG9PyRLXXG8CMX4PHGc4DMXCTsBYxv6T3GOTpW/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel
> <
> https://urlsand.esvalabs.com/?u=https%3A%2F%2Fsecure-web.cisco.com%2F139fG07AH98RfoK_SoTpu-tcev3I6LbfqnNToVDRGIvCJsmi1AvcbL1c_t7Dd-rZAXrbqZqUyjb-Sim4Tlgxui3zBHM-ntzSY3xE-0nyd4prF3cuito1iGsDjrgMaAqQ35mIgJeRu2NgXkmQYh5E_wUGoyaoiTz5wLOF2f_rz5wXX3QfIIeUKae7OPTyPuN7OoBJ_gqHxxZ0pK0K6ZyHmOqaF5vc7CmBgK26UgmtjXgat8_vjnbfbDbp_rO_0k1IdDLJjIkyBoSdmFO6wmG6H4Y4r1CzG9PyRLXXG8CMX4PHGc4DMXCTsBYxv6T3GOTpW%2Fhttps%253A%252F%252Fstat.ethz.ch%252Fmailman%252Flistinfo%252Fbioc-devel&e=8c00e339&h=11e86087&f=y&p=n
> >
>
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited. If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
The information in this e-mail is intended only for the ...{{dropped:18}}



More information about the Bioc-devel mailing list