[Bioc-devel] httr::GET() problem downloading a ExperimentHub resource

Robert Castelo robert@c@@te|o @end|ng |rom up|@edu
Thu Mar 30 10:26:13 CEST 2023


Thanks Martin, this has been really helpful, I've reported your 
observations to our sysadmins and they fixed it by modifying the Apache 
config file in our server, replacing the line:

AddType application/x-gzip .gz .tgz

by

AddType application/x-gzip .gz .tgz .bam

and now it works, just in case somebody else experiences this problem in 
the future.

as far as I know, indeed BAM files use a block-compression format 
compatible with GZIP, so I guess curl needs to know about this, and I 
guess the Microsoft Azure Data Lake Cloud is already configured to serve 
that information, reason why the BAM files there were downloading fine.

robert.

On 3/30/23 03:09, Martin Morgan wrote:
>
> Some more not-necessarily helpful observations. You can get verbose 
> output with
>
> curl::curl_fetch_disk(url, tempfile(), handle = new_handle(verbose = 
> TRUE))
>
> and on the command line with curl -v -L …
>
> Also, it seems that other BAM files can be downloaded, e.g., from 
> eh[["EH3502"]] (also httr::with_verbose(eh[["EH3502"]])). Would be 
> worth while verifying this a little more completely; I looked for
>
> mcols(eh)|> as_tibble(rownames="ehid") |> filter(sourcetype == "BAM", 
> rdataclass == "BamFile")
>
> If it’s true that other BAM files are ok, then it points to the way 
> the files are being served on ‘your’ end.
>
> One difference I see is that ‘your’ files have Content-Encoding: gzip, 
> but there is no Content-Encoding tag on the BAM file above. I guess 
> BAM files are (some flavor of) gzip (?), but maybe this is confusing 
> the R curl library…
>
> Martin
>
> *From: *Robert Castelo <robert.castelo using upf.edu>
> *Date: *Wednesday, March 29, 2023 at 4:08 PM
> *To: *Martin Morgan <mtmorgan.bioc using gmail.com>, 
> bioc-devel using r-project.org <bioc-devel using r-project.org>
> *Subject: *Re: [Bioc-devel] httr::GET() problem downloading a 
> ExperimentHub resource
>
> good catch, but really enigmatic, BAI files work, but BAM don't:
>
> dat <- 
> read.csv("https://raw.githubusercontent.com/functionalgenomics/gDNAinRNAseqData/devel/inst/extdata/metadata_LiYu22subsetBAMfiles.csv" 
> <https://raw.githubusercontent.com/functionalgenomics/gDNAinRNAseqData/devel/inst/extdata/metadata_LiYu22subsetBAMfiles.csv>)
> rdatapath <- strsplit(dat$RDataPath, ":")
> bamfiles <- unlist(rdatapath)[seq(1, 18, 2)]
> baifiles <- unlist(rdatapath)[seq(2, 18, 2)]
>
> bamurls <- paste0(dat$Location_Prefix, bamfiles)
> baiurls <- paste0(dat$Location_Prefix, baifiles)
>
> ## BAM files give error
> for (bf in bamurls) {
>   cat(sprintf("%s\n", basename(bf)))
>   tryCatch({
>     curl::curl_fetch_disk(bf, tempfile())
>   }, error=function(e) message(paste0(e, "\n")))
> }
>
> ## BAI files do not give error
> for (bf in baiurls) {
>   cat(sprintf("%s\n", basename(bf)))
>   tryCatch({
>     curl::curl_fetch_disk(bf, tempfile())
>   }, error=function(e) message(paste0(e, "\n")))
> }
>
> any further idea??
>
> robert.
>
> On 29/3/23 21:10, Martin Morgan wrote:
>
>     Not really helpful but this could be simplified a bit by removing
>     the redirect from experiment hub, and the layer from httr to curl, so
>
>     url =
>     "https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam"
>
>     curl::curl_fetch_disk(url, tempfile())
>
>     Error in
>     curl::curl_fetch_disk("https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam"
>     <https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam>,
>     :
>
>       Failed writing received data to disk/application
>
>     I notice the index file (extension .bai) works; do other BAM files
>     work, too?
>
>     Martin
>
>     *From: *Bioc-devel <bioc-devel-bounces using r-project.org>
>     <mailto:bioc-devel-bounces using r-project.org> on behalf of Robert
>     Castelo <robert.castelo using upf.edu> <mailto:robert.castelo using upf.edu>
>     *Date: *Wednesday, March 29, 2023 at 1:18 PM
>     *To: *bioc-devel using r-project.org <bioc-devel using r-project.org>
>     <mailto:bioc-devel using r-project.org>
>     *Subject: *[Bioc-devel] httr::GET() problem downloading a
>     ExperimentHub resource
>
>     hi,
>
>     we recently added a few new ExperimentHub resources, consisting of
>     BAM
>     files and their corresponding BAI files and hosted in my own server.
>     while it seems that they are accessible, they cannot be downloaded
>     through the ExperimentHub API. the minimum example reproducing the
>     problem is this one (using Bioc devel):
>
>     library(ExperimentHub)
>     httr::GET("https://experimenthub.bioconductor.org/fetch/8129")
>     Error in curl::curl_fetch_memory(url, handle = handle) :
>        Failed writing received data to disk/application
>
>     while there's apparently no problem to "manually" download the
>     resource
>     using 'download.file()' and loading it with
>     'GenomicAlignments::readGAlignments()':
>
>     download.file("https://experimenthub.bioconductor.org/fetch/8129",
>     "file.bam")
>     trying URL 'https://experimenthub.bioconductor.org/fetch/8129'
>     Content type 'application/octet-stream' length 13296358 bytes
>     (12.7 MB)
>     ==================================================
>     downloaded 12.7 MB
>
>     gal <- GenomicAlignments::readGAlignments("file.bam")
>     gal[1:3]
>     GAlignments object with 3 alignments and 0 metadata columns:
>            seqnames strand       cigar    qwidth     start end     width
>               <Rle>  <Rle> <character> <integer> <integer> <integer>
>     <integer>
>        [1]     chr1      +       49M1S        50     16208 16256        49
>        [2]     chr1      +       3S47M        50     16976 17022        47
>        [3]     chr1      -  10M177N40M        50     17046 17272       227
>                njunc
>            <integer>
>        [1]         0
>        [2]         0
>        [3]         1
>        -------
>        seqinfo: 2580 sequences from an unspecified genome
>
>     any hint why 'httr::GET()' fails, while 'download.file()' doesn't?
>
>     thanks!!
>
>     robert.
>     ps: just to clarify, the 'httr::GET()' example is behind the
>     following
>     problem:
>
>     eh <- ExperimentHub()
>     z <- eh[["EH8079"]]
>     see ?gDNAinRNAseqData and browseVignettes('gDNAinRNAseqData') for
>     documentation
>     downloading 2 resources
>     retrieving 2 resources
>     |======================================================================|
>
>     100%
>
>     Error: failed to load resource
>        name: EH8079
>        title: RNA-seq data BAM file subset of HRR589632 contaminated
>     with 0%
>     gDNA
>        reason: 1 resources failed to download
>     In addition: Warning messages:
>     1: download failed
>        web resource path:
>https://experimenthub.bioconductor.org/fetch/8129’
>     <https://experimenthub.bioconductor.org/fetch/8129’>
>     <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
>        local file path:
>     ‘/home/rcastelo/.cache/R/ExperimentHub/12ba1aa03_8129’
>        reason: Failed writing received data to disk/application
>     2: bfcadd() failed; resource removed
>        rid: BFC3
>        fpath: ‘https://experimenthub.bioconductor.org/fetch/8129’
>     <https://experimenthub.bioconductor.org/fetch/8129’>
>     <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
>        reason: download failed
>     3: download failed
>        hub path: ‘https://experimenthub.bioconductor.org/fetch/8129’
>     <https://experimenthub.bioconductor.org/fetch/8129’>
>     <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
>        cache resource: ‘EH8079 : 8129’
>        reason: bfcadd() failed; see warnings()
>
>
>             [[alternative HTML version deleted]]
>
>     _______________________________________________
>     Bioc-devel using r-project.org mailing list
>     https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> -- 
> Robert Castelo, PhD
> Associate Professor
> Dept. of Medicine and Life Sciences
> Universitat Pompeu Fabra (UPF)
> Barcelona Biomedical Research Park (PRBB)
> Dr Aiguader 88
> E-08003 Barcelona, Spain
> telf: +34.933.160.514

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Medicine and Life Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list