[Bioc-devel] httr::GET() problem downloading a ExperimentHub resource

Robert Castelo robert@c@@te|o @end|ng |rom up|@edu
Wed Mar 29 22:08:54 CEST 2023


good catch, but really enigmatic, BAI files work, but BAM don't:

dat <- 
read.csv("https://raw.githubusercontent.com/functionalgenomics/gDNAinRNAseqData/devel/inst/extdata/metadata_LiYu22subsetBAMfiles.csv")
rdatapath <- strsplit(dat$RDataPath, ":")
bamfiles <- unlist(rdatapath)[seq(1, 18, 2)]
baifiles <- unlist(rdatapath)[seq(2, 18, 2)]

bamurls <- paste0(dat$Location_Prefix, bamfiles)
baiurls <- paste0(dat$Location_Prefix, baifiles)

## BAM files give error
for (bf in bamurls) {
   cat(sprintf("%s\n", basename(bf)))
   tryCatch({
     curl::curl_fetch_disk(bf, tempfile())
   }, error=function(e) message(paste0(e, "\n")))
}

## BAI files do not give error
for (bf in baiurls) {
   cat(sprintf("%s\n", basename(bf)))
   tryCatch({
     curl::curl_fetch_disk(bf, tempfile())
   }, error=function(e) message(paste0(e, "\n")))
}

any further idea??

robert.

On 29/3/23 21:10, Martin Morgan wrote:
>
> Not really helpful but this could be simplified a bit by removing the 
> redirect from experiment hub, and the layer from httr to curl, so
>
> url = 
> "https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam"
>
> curl::curl_fetch_disk(url, tempfile())
>
> Error in 
> curl::curl_fetch_disk("https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam", 
> :
>
>   Failed writing received data to disk/application
>
> I notice the index file (extension .bai) works; do other BAM files 
> work, too?
>
> Martin
>
> *From: *Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of 
> Robert Castelo <robert.castelo using upf.edu>
> *Date: *Wednesday, March 29, 2023 at 1:18 PM
> *To: *bioc-devel using r-project.org <bioc-devel using r-project.org>
> *Subject: *[Bioc-devel] httr::GET() problem downloading a 
> ExperimentHub resource
>
> hi,
>
> we recently added a few new ExperimentHub resources, consisting of BAM
> files and their corresponding BAI files and hosted in my own server.
> while it seems that they are accessible, they cannot be downloaded
> through the ExperimentHub API. the minimum example reproducing the
> problem is this one (using Bioc devel):
>
> library(ExperimentHub)
> httr::GET("https://experimenthub.bioconductor.org/fetch/8129")
> Error in curl::curl_fetch_memory(url, handle = handle) :
>    Failed writing received data to disk/application
>
> while there's apparently no problem to "manually" download the resource
> using 'download.file()' and loading it with
> 'GenomicAlignments::readGAlignments()':
>
> download.file("https://experimenthub.bioconductor.org/fetch/8129",
> "file.bam")
> trying URL 'https://experimenthub.bioconductor.org/fetch/8129'
> Content type 'application/octet-stream' length 13296358 bytes (12.7 MB)
> ==================================================
> downloaded 12.7 MB
>
> gal <- GenomicAlignments::readGAlignments("file.bam")
> gal[1:3]
> GAlignments object with 3 alignments and 0 metadata columns:
>        seqnames strand       cigar    qwidth     start end     width
>           <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
>    [1]     chr1      +       49M1S        50     16208 16256        49
>    [2]     chr1      +       3S47M        50     16976 17022        47
>    [3]     chr1      -  10M177N40M        50     17046 17272       227
>            njunc
>        <integer>
>    [1]         0
>    [2]         0
>    [3]         1
>    -------
>    seqinfo: 2580 sequences from an unspecified genome
>
> any hint why 'httr::GET()' fails, while 'download.file()' doesn't?
>
> thanks!!
>
> robert.
> ps: just to clarify, the 'httr::GET()' example is behind the following
> problem:
>
> eh <- ExperimentHub()
> z <- eh[["EH8079"]]
> see ?gDNAinRNAseqData and browseVignettes('gDNAinRNAseqData') for
> documentation
> downloading 2 resources
> retrieving 2 resources
> |======================================================================|
> 100%
>
> Error: failed to load resource
>    name: EH8079
>    title: RNA-seq data BAM file subset of HRR589632 contaminated with 0%
> gDNA
>    reason: 1 resources failed to download
> In addition: Warning messages:
> 1: download failed
>    web resource path:
>https://experimenthub.bioconductor.org/fetch/8129’
> <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
>    local file path: ‘/home/rcastelo/.cache/R/ExperimentHub/12ba1aa03_8129’
>    reason: Failed writing received data to disk/application
> 2: bfcadd() failed; resource removed
>    rid: BFC3
>    fpath: ‘https://experimenthub.bioconductor.org/fetch/8129’
> <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
>    reason: download failed
> 3: download failed
>    hub path: ‘https://experimenthub.bioconductor.org/fetch/8129’
> <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
>    cache resource: ‘EH8079 : 8129’
>    reason: bfcadd() failed; see warnings()
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Medicine and Life Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list