[Bioc-devel] Issue importing bigwig files with rtracklayer from Amazon Cloud Drive
Leonardo Collado Torres
lcollado at jhu.edu
Tue May 31 20:18:16 CEST 2016
Hi Michael,
Thanks!
Actually, it looks like there are a few more quick changes I need you
to do. Simply at
https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R
replace path.expand() with expandPath(). I'm not sure this applies to
all current path.expand() calls, but at least it does for
https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R#L20
Best,
Leo
> library(recount); system.time( regions <- expressed_regions('SRP009615', 'chrY', cutoff = 5L) )
2016-05-31 14:11:52 loadCoverage: loading BigWig file
http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
Error in seqinfo(con) : UCSC library operation failed
In addition: Warning message:
In seqinfo(con) :
Couldn't open http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
Timing stopped at: 0.068 0.009 0.817
> traceback()
14: .Call(BWGFile_seqlengths, path.expand(path(x)))
13: seqinfo(con)
12: seqinfo(con)
11: .local(con, format, text, ...)
10: import(file, selection = range, as = "RleList")
9: import(file, selection = range, as = "RleList")
8: FUN(X[[i]], ...)
7: lapply(as.list(X), FUN = FUN, ...)
6: lapply(as.list(X), FUN = FUN, ...)
5: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
verbose = verbose)
4: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
verbose = verbose)
3: loadCoverage(files = meanFile, chr = chr, chrlen = chrlen)
2: expressed_regions("SRP009615", "chrY", cutoff = 5L)
1: system.time(regions <- expressed_regions("SRP009615", "chrY",
cutoff = 5L))
> options(width = 120); devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.0 RC (2016-05-01 r70572)
system x86_64, darwin13.4.0
ui AQUA
language (EN)
collate en_US.UTF-8
tz America/New_York
date 2016-05-31
Packages ---------------------------------------------------------------------------------------------------------------
package * version date source
acepack 1.3-3.3 2014-11-24 CRAN (R 3.3.0)
AnnotationDbi 1.35.3 2016-05-27 Bioconductor
Biobase 2.33.0 2016-05-05 Bioconductor
BiocGenerics * 0.19.0 2016-05-05 Bioconductor
BiocParallel 1.7.2 2016-05-20 Bioconductor
biomaRt 2.29.2 2016-05-30 Bioconductor
Biostrings 2.41.1 2016-05-27 Bioconductor
bitops 1.0-6 2013-08-17 CRAN (R 3.3.0)
BSgenome 1.41.0 2016-05-05 Bioconductor
bumphunter 1.13.0 2016-05-05 Bioconductor
chron 2.3-47 2015-06-24 CRAN (R 3.3.0)
cluster 2.0.4 2016-04-18 CRAN (R 3.3.0)
codetools 0.2-14 2015-07-15 CRAN (R 3.3.0)
colorspace 1.2-6 2015-03-11 CRAN (R 3.3.0)
data.table 1.9.6 2015-09-19 CRAN (R 3.3.0)
DBI 0.4-1 2016-05-08 CRAN (R 3.3.0)
derfinder * 1.7.5 2016-05-20 Bioconductor
derfinderHelper 1.7.3 2016-05-20 Bioconductor
devtools 1.11.1 2016-04-21 CRAN (R 3.3.0)
digest 0.6.9 2016-01-08 CRAN (R 3.3.0)
doRNG 1.6 2014-03-07 CRAN (R 3.3.0)
foreach 1.4.3 2015-10-13 CRAN (R 3.3.0)
foreign 0.8-66 2015-08-19 CRAN (R 3.3.0)
Formula 1.2-1 2015-04-07 CRAN (R 3.3.0)
GenomeInfoDb * 1.9.1 2016-05-13 Bioconductor
GenomicAlignments 1.9.0 2016-05-05 Bioconductor
GenomicFeatures 1.25.12 2016-05-21 Bioconductor
GenomicFiles 1.9.7 2016-05-27 Bioconductor
GenomicRanges * 1.25.0 2016-05-05 Bioconductor
ggplot2 2.1.0 2016-03-01 CRAN (R 3.3.0)
gridExtra 2.2.1 2016-02-29 CRAN (R 3.3.0)
gtable 0.2.0 2016-02-26 CRAN (R 3.3.0)
Hmisc 3.17-4 2016-05-02 CRAN (R 3.3.0)
IRanges * 2.7.1 2016-05-27 Bioconductor
iterators 1.0.8 2015-10-13 CRAN (R 3.3.0)
lattice 0.20-33 2015-07-14 CRAN (R 3.3.0)
latticeExtra 0.6-28 2016-02-09 CRAN (R 3.3.0)
locfit 1.5-9.1 2013-04-20 CRAN (R 3.3.0)
magrittr 1.5 2014-11-22 CRAN (R 3.3.0)
Matrix 1.2-6 2016-05-02 CRAN (R 3.3.0)
matrixStats 0.50.2 2016-04-24 CRAN (R 3.3.0)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
munsell 0.4.3 2016-02-13 CRAN (R 3.3.0)
nnet 7.3-12 2016-02-02 CRAN (R 3.3.0)
pkgmaker 0.22 2014-05-14 CRAN (R 3.3.0)
plyr 1.8.3 2015-06-12 CRAN (R 3.3.0)
qvalue 2.5.2 2016-05-20 Bioconductor
RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.3.0)
Rcpp 0.12.5 2016-05-14 CRAN (R 3.3.0)
RCurl 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
recount * 0.99.0 2016-05-31 Bioconductor
registry 0.3 2015-07-08 CRAN (R 3.3.0)
reshape2 1.4.1 2014-12-06 CRAN (R 3.3.0)
rngtools 1.2.4 2014-03-06 CRAN (R 3.3.0)
rpart 4.1-10 2015-06-29 CRAN (R 3.3.0)
Rsamtools 1.25.0 2016-05-05 Bioconductor
RSQLite 1.0.0 2014-10-25 CRAN (R 3.3.0)
rtracklayer 1.33.2 2016-05-31 Github
(Bioconductor-mirror/rtracklayer at 917973e)
S4Vectors * 0.11.2 2016-05-27 Bioconductor
scales 0.4.0 2016-02-26 CRAN (R 3.3.0)
stringi 1.0-1 2015-10-22 CRAN (R 3.3.0)
stringr 1.0.0 2015-04-30 CRAN (R 3.3.0)
SummarizedExperiment 1.3.2 2016-05-20 Bioconductor
survival 2.39-4 2016-05-11 CRAN (R 3.3.0)
VariantAnnotation 1.19.1 2016-05-20 Bioconductor
withr 1.0.1 2016-02-04 CRAN (R 3.3.0)
XML 3.98-1.4 2016-03-01 CRAN (R 3.3.0)
xtable 1.8-2 2016-02-05 CRAN (R 3.3.0)
XVector 0.13.0 2016-05-05 Bioconductor
zlibbioc 1.19.0 2016-05-05 Bioconductor
On Tue, May 31, 2016 at 2:02 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> Thanks for pointing out that buglet. Fixed.
>
> On Tue, May 31, 2016 at 10:55 AM, Leonardo Collado Torres
> <lcollado at jhu.edu> wrote:
>> Hi Michael,
>>
>> We tried getting things to work with Amazon Cloud Drive (see Abhi's
>> efforts at https://github.com/nellore/duffel/commits/master). But we
>> now have the data hosted elsewhere where the links work properly.
>>
>> I just noted a small mistake on rtracklayer:::expandPath(). See:
>>
>>> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http||ftp')
>> [1] FALSE
>>> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http')
>> [1] TRUE
>>
>>
>> The fix is simple. At
>> https://github.com/Bioconductor-mirror/rtracklayer/blob/c4b842bc4daa4b9db26cb86f3284cf8cf5c32ebd/R/web.R#L62-L66,
>> change it to:
>>
>> expandPath <- function(x) {
>> if (startsWith(x, "http") | startsWith(x, "ftp"))
>> expandURL(x)
>> else path.expand(x)
>> }
>>
>> Best,
>> Leo
>>
>> On Thu, May 5, 2016 at 8:10 PM, Michael Lawrence
>> <lawrence.michael at gene.com> wrote:
>>> I checked in something that tries to find openssl automatically on the Mac.
>>>
>>> It looks like AWS is for some reason returning 404 for the HEAD command that
>>> the UCSC library uses the get info about the file like the content size.
>>> Same thing happens when I play around in Firefox's developer tools. The
>>> error response header claims a JSON content type, but no JSON is actually
>>> sent, so there is no further description of the error. I think this is a bug
>>> in Amazon.
>>>
>>> Seems like for now you'll need to download the file first.
>>>
>>> Michael
>>>
>>> On Thu, May 5, 2016 at 2:46 PM, Leonardo Collado Torres <lcollado at jhu.edu>
>>> wrote:
>>>>
>>>> Hi Michael,
>>>>
>>>> I forgot about pkg-util (just did a fresh BioC 3.3 install). I assumed
>>>> the OS X binary would work out of the box.
>>>>
>>>> Anyhow, I installed rtracklayer (release) manually and got another
>>>> error (slightly different message now).
>>>>
>>>>
>>>>
>>>>
>>>> $ svn co
>>>> https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_3/madman/Rpacks/rtracklayer
>>>> $ R CMD INSTALL rtracklayer
>>>> Loading required package: colorout
>>>> * installing to library
>>>> ‘/Library/Frameworks/R.framework/Versions/3.3release/Resources/library’
>>>> * installing *source* package ‘rtracklayer’ ...
>>>> checking for pkg-config... /usr/local/bin/pkg-config
>>>> checking pkg-config is at least version 0.9.0... yes
>>>> checking for OPENSSL... yes
>>>> ## more output
>>>>
>>>> $ R
>>>> > library('rtracklayer')
>>>> > unshorten_url <- function(uri) {
>>>> + require('RCurl')
>>>> + opts <- list(
>>>> + followlocation = TRUE, # resolve redirects
>>>> + ssl.verifyhost = FALSE, # suppress certain SSL errors
>>>> + ssl.verifypeer = FALSE,
>>>> + nobody = TRUE, # perform HEAD request
>>>> + verbose = FALSE
>>>> + )
>>>> + curlhandle <- getCurlHandle(.opts = opts)
>>>> + getURL(uri, curl = curlhandle)
>>>> + info <- getCurlInfo(curlhandle)
>>>> + rm(curlhandle) # release the curlhandle!
>>>> + info$effective.url
>>>> + }
>>>> > url <-
>>>> > unshorten_url('http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw')
>>>> Loading required package: RCurl
>>>> Loading required package: bitops
>>>> > url
>>>> [1]
>>>> "https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5"
>>>> > x <- import.bw(url, as = 'RleList')
>>>> Error in seqinfo(ranges) : UCSC library operation failed
>>>> In addition: Warning message:
>>>> In seqinfo(ranges) :
>>>> Couldn't open
>>>>
>>>> https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>>>> > x <-
>>>> > import.bw('http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5')
>>>> Error in seqinfo(ranges) : UCSC library operation failed
>>>> In addition: Warning messages:
>>>> 1: In seqinfo(ranges) :
>>>> TCP non-blocking connect() to content-na.drive.amazonaws.com
>>>> timed-out in select() after 10000 milliseconds - Cancelling!
>>>> 2: In seqinfo(ranges) :
>>>> Couldn't open
>>>>
>>>> http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>>>> > ## Reproducibility info
>>>> > message(Sys.time())
>>>> 2016-05-05 17:38:30
>>>> > options(width = 120)
>>>> > devtools::session_info()
>>>> Session info
>>>> -----------------------------------------------------------------------------------------------------------
>>>> setting value
>>>> version R version 3.3.0 RC (2016-05-01 r70572)
>>>> system x86_64, darwin13.4.0
>>>> ui X11
>>>> language (EN)
>>>> collate en_US.UTF-8
>>>> tz America/New_York
>>>> date 2016-05-05
>>>>
>>>> Packages
>>>> ---------------------------------------------------------------------------------------------------------------
>>>> package * version date source
>>>> Biobase 2.32.0 2016-05-04 Bioconductor
>>>> BiocGenerics * 0.18.0 2016-05-04 Bioconductor
>>>> BiocParallel 1.6.0 2016-05-04 Bioconductor
>>>> Biostrings 2.40.0 2016-05-04 Bioconductor
>>>> bitops * 1.0-6 2013-08-17 CRAN (R 3.3.0)
>>>> colorout * 1.1-2 2016-05-05 Github
>>>> (jalvesaq/colorout at 6538970)
>>>> devtools 1.11.1 2016-04-21 CRAN (R 3.3.0)
>>>> digest 0.6.9 2016-01-08 CRAN (R 3.3.0)
>>>> GenomeInfoDb * 1.8.0 2016-05-04 Bioconductor
>>>> GenomicAlignments 1.8.0 2016-05-04 Bioconductor
>>>> GenomicRanges * 1.24.0 2016-05-04 Bioconductor
>>>> IRanges * 2.6.0 2016-05-04 Bioconductor
>>>> memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
>>>> RCurl * 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
>>>> Rsamtools 1.24.0 2016-05-04 Bioconductor
>>>> rtracklayer * 1.32.0 2016-05-05 Bioconductor
>>>> S4Vectors * 0.10.0 2016-05-04 Bioconductor
>>>> SummarizedExperiment 1.2.0 2016-05-04 Bioconductor
>>>> withr 1.0.1 2016-02-04 CRAN (R 3.3.0)
>>>> XML 3.98-1.4 2016-03-01 CRAN (R 3.3.0)
>>>> XVector 0.12.0 2016-05-04 Bioconductor
>>>> zlibbioc 1.18.0 2016-05-04 Bioconductor
>>>> >
>>>>
>>>> On Thu, May 5, 2016 at 5:24 PM, Michael Lawrence
>>>> <lawrence.michael at gene.com> wrote:
>>>> > The URL redirection is something I can try to add. For the other error,
>>>> > you
>>>> > need to get openssl installed and made visible to pkg-config, so that
>>>> > rtracklayer finds it during its build process.
>>>> >
>>>> > Michael
>>>> >
>>>> > On Thu, May 5, 2016 at 2:01 PM, Leonardo Collado Torres
>>>> > <lcollado at jhu.edu>
>>>> > wrote:
>>>> >>
>>>> >> Hi Michael,
>>>> >>
>>>> >> I have a use case that is similar to
>>>> >> https://support.bioconductor.org/p/81267/#82142 and looks to me like
>>>> >> it might need some changes in rtracklayer to work. That's why I'm
>>>> >> posting it here this time.
>>>> >>
>>>> >> Basically, I'm trying to use rtracklayer to import a bigwig file over
>>>> >> the web which is in a different type of url than before. Using
>>>> >> utils::download.file() with the defaults doesn't work, I have to use
>>>> >> method = 'curl' and extra = '-L'.
>>>> >>
>>>> >> More specifically, the original url
>>>> >> http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw has an
>>>> >> effective url
>>>> >>
>>>> >> https://content-na.drive.amazonaws.com/cdproxy/templink/i_aQAPZJkJ9d9lN1NO5DJJtlbpvAdgbNuc1SkqSTHFouFiZq5
>>>> >>
>>>> >> Now, using the second url with utils::download.file() and default
>>>> >> methods also doesn't work. It does on the browser though.
>>>> >>
>>>> >>
>>>> >> As you can see, downloading the file doesn't work out of the box.
>>>> >> Which I guess that it's not surprising that using rtracklayer I get
>>>> >> errors like:
>>>> >>
>>>> >> In seqinfo(ranges) :
>>>> >> No openssl available in netConnectHttps for
>>>> >> content-na.drive.amazonaws.com : 443
>>>> >>
>>>> >> You can find further details (code and log file) at
>>>> >> https://gist.github.com/lcolladotor/c500dd79d49aed1ef33ade5417111453
>>>> >>
>>>> >> Thanks,
>>>> >> Leo
>>>> >
>>>> >
>>>
>>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list