[Bioc-devel] httr::GET() problem downloading a ExperimentHub resource

Martin Morgan mtmorg@n@b|oc @end|ng |rom gm@||@com
Thu Mar 30 03:09:12 CEST 2023


Some more not-necessarily helpful observations. You can get verbose output with

curl::curl_fetch_disk(url, tempfile(), handle = new_handle(verbose = TRUE))

and on the command line with curl -v -L �

Also, it seems that other BAM files can be downloaded, e.g., from eh[["EH3502"]] (also httr::with_verbose(eh[["EH3502"]])). Would be worth while verifying this a little more completely; I looked for

mcols(eh)|> as_tibble(rownames="ehid") |> filter(sourcetype == "BAM", rdataclass == "BamFile")

If it�s true that other BAM files are ok, then it points to the way the files are being served on �your� end.

One difference I see is that �your� files have Content-Encoding: gzip, but there is no Content-Encoding tag on the BAM file above. I guess BAM files are (some flavor of) gzip (?), but maybe this is confusing the R curl library�

Martin

From: Robert Castelo <robert.castelo using upf.edu>
Date: Wednesday, March 29, 2023 at 4:08 PM
To: Martin Morgan <mtmorgan.bioc using gmail.com>, bioc-devel using r-project.org <bioc-devel using r-project.org>
Subject: Re: [Bioc-devel] httr::GET() problem downloading a ExperimentHub resource
good catch, but really enigmatic, BAI files work, but BAM don't:

dat <- read.csv("https://raw.githubusercontent.com/functionalgenomics/gDNAinRNAseqData/devel/inst/extdata/metadata_LiYu22subsetBAMfiles.csv"<https://raw.githubusercontent.com/functionalgenomics/gDNAinRNAseqData/devel/inst/extdata/metadata_LiYu22subsetBAMfiles.csv>)
rdatapath <- strsplit(dat$RDataPath, ":")
bamfiles <- unlist(rdatapath)[seq(1, 18, 2)]
baifiles <- unlist(rdatapath)[seq(2, 18, 2)]

bamurls <- paste0(dat$Location_Prefix, bamfiles)
baiurls <- paste0(dat$Location_Prefix, baifiles)

## BAM files give error
for (bf in bamurls) {
  cat(sprintf("%s\n", basename(bf)))
  tryCatch({
    curl::curl_fetch_disk(bf, tempfile())
  }, error=function(e) message(paste0(e, "\n")))
}

## BAI files do not give error
for (bf in baiurls) {
  cat(sprintf("%s\n", basename(bf)))
  tryCatch({
    curl::curl_fetch_disk(bf, tempfile())
  }, error=function(e) message(paste0(e, "\n")))
}

any further idea??

robert.

On 29/3/23 21:10, Martin Morgan wrote:
Not really helpful but this could be simplified a bit by removing the redirect from experiment hub, and the layer from httr to curl, so

url = "https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam"
curl::curl_fetch_disk(url, tempfile())
Error in curl::curl_fetch_disk("https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam"<https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam>,  :
  Failed writing received data to disk/application

I notice the index file (extension .bai) works; do other BAM files work, too?

Martin

From: Bioc-devel <bioc-devel-bounces using r-project.org><mailto:bioc-devel-bounces using r-project.org> on behalf of Robert Castelo <robert.castelo using upf.edu><mailto:robert.castelo using upf.edu>
Date: Wednesday, March 29, 2023 at 1:18 PM
To: bioc-devel using r-project.org<mailto:bioc-devel using r-project.org> <bioc-devel using r-project.org><mailto:bioc-devel using r-project.org>
Subject: [Bioc-devel] httr::GET() problem downloading a ExperimentHub resource
hi,

we recently added a few new ExperimentHub resources, consisting of BAM
files and their corresponding BAI files and hosted in my own server.
while it seems that they are accessible, they cannot be downloaded
through the ExperimentHub API. the minimum example reproducing the
problem is this one (using Bioc devel):

library(ExperimentHub)
httr::GET("https://experimenthub.bioconductor.org/fetch/8129")
Error in curl::curl_fetch_memory(url, handle = handle) :
   Failed writing received data to disk/application

while there's apparently no problem to "manually" download the resource
using 'download.file()' and loading it with
'GenomicAlignments::readGAlignments()':

download.file("https://experimenthub.bioconductor.org/fetch/8129",
"file.bam")
trying URL 'https://experimenthub.bioconductor.org/fetch/8129'
Content type 'application/octet-stream' length 13296358 bytes (12.7 MB)
==================================================
downloaded 12.7 MB

gal <- GenomicAlignments::readGAlignments("file.bam")
gal[1:3]
GAlignments object with 3 alignments and 0 metadata columns:
       seqnames strand       cigar    qwidth     start end     width
          <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
   [1]     chr1      +       49M1S        50     16208 16256        49
   [2]     chr1      +       3S47M        50     16976 17022        47
   [3]     chr1      -  10M177N40M        50     17046 17272       227
           njunc
       <integer>
   [1]         0
   [2]         0
   [3]         1
   -------
   seqinfo: 2580 sequences from an unspecified genome

any hint why 'httr::GET()' fails, while 'download.file()' doesn't?

thanks!!

robert.
ps: just to clarify, the 'httr::GET()' example is behind the following
problem:

eh <- ExperimentHub()
z <- eh[["EH8079"]]
see ?gDNAinRNAseqData and browseVignettes('gDNAinRNAseqData') for
documentation
downloading 2 resources
retrieving 2 resources
|======================================================================|
100%

Error: failed to load resource
   name: EH8079
   title: RNA-seq data BAM file subset of HRR589632 contaminated with 0%
gDNA
   reason: 1 resources failed to download
In addition: Warning messages:
1: download failed
   web resource path:
�https://experimenthub.bioconductor.org/fetch/8129�
<https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
   local file path: �/home/rcastelo/.cache/R/ExperimentHub/12ba1aa03_8129�
   reason: Failed writing received data to disk/application
2: bfcadd() failed; resource removed
   rid: BFC3
   fpath: �https://experimenthub.bioconductor.org/fetch/8129�
<https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
   reason: download failed
3: download failed
   hub path: �https://experimenthub.bioconductor.org/fetch/8129�
<https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
   cache resource: �EH8079 : 8129�
   reason: bfcadd() failed; see warnings()


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--

Robert Castelo, PhD

Associate Professor

Dept. of Medicine and Life Sciences

Universitat Pompeu Fabra (UPF)

Barcelona Biomedical Research Park (PRBB)

Dr Aiguader 88

E-08003 Barcelona, Spain

telf: +34.933.160.514

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list