[Bioc-devel] httr::GET() problem downloading a ExperimentHub resource
Martin Morgan
mtmorg@n@b|oc @end|ng |rom gm@||@com
Thu Mar 30 03:09:12 CEST 2023
Some more not-necessarily helpful observations. You can get verbose output with
curl::curl_fetch_disk(url, tempfile(), handle = new_handle(verbose = TRUE))
and on the command line with curl -v -L �
Also, it seems that other BAM files can be downloaded, e.g., from eh[["EH3502"]] (also httr::with_verbose(eh[["EH3502"]])). Would be worth while verifying this a little more completely; I looked for
mcols(eh)|> as_tibble(rownames="ehid") |> filter(sourcetype == "BAM", rdataclass == "BamFile")
If it�s true that other BAM files are ok, then it points to the way the files are being served on �your� end.
One difference I see is that �your� files have Content-Encoding: gzip, but there is no Content-Encoding tag on the BAM file above. I guess BAM files are (some flavor of) gzip (?), but maybe this is confusing the R curl library�
Martin
From: Robert Castelo <robert.castelo using upf.edu>
Date: Wednesday, March 29, 2023 at 4:08 PM
To: Martin Morgan <mtmorgan.bioc using gmail.com>, bioc-devel using r-project.org <bioc-devel using r-project.org>
Subject: Re: [Bioc-devel] httr::GET() problem downloading a ExperimentHub resource
good catch, but really enigmatic, BAI files work, but BAM don't:
dat <- read.csv("https://raw.githubusercontent.com/functionalgenomics/gDNAinRNAseqData/devel/inst/extdata/metadata_LiYu22subsetBAMfiles.csv"<https://raw.githubusercontent.com/functionalgenomics/gDNAinRNAseqData/devel/inst/extdata/metadata_LiYu22subsetBAMfiles.csv>)
rdatapath <- strsplit(dat$RDataPath, ":")
bamfiles <- unlist(rdatapath)[seq(1, 18, 2)]
baifiles <- unlist(rdatapath)[seq(2, 18, 2)]
bamurls <- paste0(dat$Location_Prefix, bamfiles)
baiurls <- paste0(dat$Location_Prefix, baifiles)
## BAM files give error
for (bf in bamurls) {
cat(sprintf("%s\n", basename(bf)))
tryCatch({
curl::curl_fetch_disk(bf, tempfile())
}, error=function(e) message(paste0(e, "\n")))
}
## BAI files do not give error
for (bf in baiurls) {
cat(sprintf("%s\n", basename(bf)))
tryCatch({
curl::curl_fetch_disk(bf, tempfile())
}, error=function(e) message(paste0(e, "\n")))
}
any further idea??
robert.
On 29/3/23 21:10, Martin Morgan wrote:
Not really helpful but this could be simplified a bit by removing the redirect from experiment hub, and the layer from httr to curl, so
url = "https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam"
curl::curl_fetch_disk(url, tempfile())
Error in curl::curl_fetch_disk("https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam"<https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam>, :
Failed writing received data to disk/application
I notice the index file (extension .bai) works; do other BAM files work, too?
Martin
From: Bioc-devel <bioc-devel-bounces using r-project.org><mailto:bioc-devel-bounces using r-project.org> on behalf of Robert Castelo <robert.castelo using upf.edu><mailto:robert.castelo using upf.edu>
Date: Wednesday, March 29, 2023 at 1:18 PM
To: bioc-devel using r-project.org<mailto:bioc-devel using r-project.org> <bioc-devel using r-project.org><mailto:bioc-devel using r-project.org>
Subject: [Bioc-devel] httr::GET() problem downloading a ExperimentHub resource
hi,
we recently added a few new ExperimentHub resources, consisting of BAM
files and their corresponding BAI files and hosted in my own server.
while it seems that they are accessible, they cannot be downloaded
through the ExperimentHub API. the minimum example reproducing the
problem is this one (using Bioc devel):
library(ExperimentHub)
httr::GET("https://experimenthub.bioconductor.org/fetch/8129")
Error in curl::curl_fetch_memory(url, handle = handle) :
Failed writing received data to disk/application
while there's apparently no problem to "manually" download the resource
using 'download.file()' and loading it with
'GenomicAlignments::readGAlignments()':
download.file("https://experimenthub.bioconductor.org/fetch/8129",
"file.bam")
trying URL 'https://experimenthub.bioconductor.org/fetch/8129'
Content type 'application/octet-stream' length 13296358 bytes (12.7 MB)
==================================================
downloaded 12.7 MB
gal <- GenomicAlignments::readGAlignments("file.bam")
gal[1:3]
GAlignments object with 3 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width
<Rle> <Rle> <character> <integer> <integer> <integer> <integer>
[1] chr1 + 49M1S 50 16208 16256 49
[2] chr1 + 3S47M 50 16976 17022 47
[3] chr1 - 10M177N40M 50 17046 17272 227
njunc
<integer>
[1] 0
[2] 0
[3] 1
-------
seqinfo: 2580 sequences from an unspecified genome
any hint why 'httr::GET()' fails, while 'download.file()' doesn't?
thanks!!
robert.
ps: just to clarify, the 'httr::GET()' example is behind the following
problem:
eh <- ExperimentHub()
z <- eh[["EH8079"]]
see ?gDNAinRNAseqData and browseVignettes('gDNAinRNAseqData') for
documentation
downloading 2 resources
retrieving 2 resources
|======================================================================|
100%
Error: failed to load resource
name: EH8079
title: RNA-seq data BAM file subset of HRR589632 contaminated with 0%
gDNA
reason: 1 resources failed to download
In addition: Warning messages:
1: download failed
web resource path:
�https://experimenthub.bioconductor.org/fetch/8129�
<https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
local file path: �/home/rcastelo/.cache/R/ExperimentHub/12ba1aa03_8129�
reason: Failed writing received data to disk/application
2: bfcadd() failed; resource removed
rid: BFC3
fpath: �https://experimenthub.bioconductor.org/fetch/8129�
<https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
reason: download failed
3: download failed
hub path: �https://experimenthub.bioconductor.org/fetch/8129�
<https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
cache resource: �EH8079 : 8129�
reason: bfcadd() failed; see warnings()
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Robert Castelo, PhD
Associate Professor
Dept. of Medicine and Life Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list