[Bioc-devel] Data package timeouts

Sean Davis seandavi at gmail.com
Tue Dec 5 20:02:18 CET 2017


Thanks, Leo.

It turns out that that accession is not "public" yet; the "file" that
GEOquery gets is just an HTML page saying so.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1062236

I'll work on fix to catch such problems.

Sean


On Tue, Dec 5, 2017 at 1:48 PM, Leonardo Collado Torres <lcollado at jhu.edu>
wrote:

> Hi Sean,
>
> I'm still seeing some timeouts with GEOquery 2.46.10 on bioc-release.
> Here's a quick example:
>
> library('GEOquery')
> getGEO('GSM1062236', getGPL = FALSE)
>
> I found it from
> https://github.com/leekgroup/recount/blob/master/tests/
> testthat/test-misc.R#L19
>
> Best,
> Leo
>
>
> > library('GEOquery')
> Loading required package: Biobase
> Loading required package: BiocGenerics
> Loading required package: parallel
>
> Attaching package: ‘BiocGenerics’
>
> The following objects are masked from ‘package:parallel’:
>
>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
>     clusterExport, clusterMap, parApply, parCapply, parLapply,
>     parLapplyLB, parRapply, parSapply, parSapplyLB
>
> The following objects are masked from ‘package:stats’:
>
>     IQR, mad, sd, var, xtabs
>
> The following objects are masked from ‘package:base’:
>
>     anyDuplicated, append, as.data.frame, cbind, colMeans, colnames,
>     colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
>     grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match,
>     mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
>     rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
>     table, tapply, union, unique, unsplit, which, which.max, which.min
>
> Welcome to Bioconductor
>
>     Vignettes contain introductory material; view with
>     'browseVignettes()'. To cite Bioconductor, see
>     'citation("Biobase")', and for packages 'citation("pkgname")'.
>
> Setting options('download.file.method.GEOquery'='auto')
> Setting options('GEOquery.inmemory.gpl'=FALSE)
> > getGEO('GSM1062236', getGPL = FALSE)
> File stored at:
> /var/folders/cx/n9s558kx6fb7jf5z_pgszgb80000gn/T//RtmpAyQR3U/
> GSM1062236.soft
> ## Force terminate after a long running time
> ^C
> > sessionInfo()
> R version 3.4.2 (2017-09-28)
> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> Running under: macOS Sierra 10.12.6
>
> Matrix products: default
> BLAS: /Library/Frameworks/R.framework/Versions/3.4/
> Resources/lib/libRblas.0.dylib
> LAPACK: /Library/Frameworks/R.framework/Versions/3.4/
> Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] GEOquery_2.46.10    Biobase_2.38.0      BiocGenerics_0.24.0
> [4] colorout_1.1-2
>
> loaded via a namespace (and not attached):
>  [1] Rcpp_0.12.14     tidyr_0.7.2      dplyr_0.7.4      assertthat_0.2.0
>  [5] R6_2.2.2         magrittr_1.5     rlang_0.1.4      bindrcpp_0.2
>  [9] limma_3.34.2     xml2_1.1.1       readr_1.1.1      glue_1.2.0
> [13] purrr_0.2.4      hms_0.4.0        compiler_3.4.2   pkgconfig_2.0.1
> [17] bindr_0.1        tibble_1.3.4
>
>
> On Thu, Nov 30, 2017 at 11:56 AM, Leonardo Collado Torres
> <lcollado at jhu.edu> wrote:
> >
> > Thanks Sean! I was seeing timeouts also in recount related to GEOquery
> which I just recently looked into.
> >
> > On Thu, Nov 30, 2017 at 11:14 AM, Sean Davis <seandavi at gmail.com> wrote:
> >>
> >>
> >> On Thu, Nov 30, 2017 at 6:05 AM, Mike Smith <grimbough at gmail.com>
> wrote:
> >>
> >> > Thanks for the speedy response Sean.  I'll switch back to the version
> >> > using a file name shortly.
> >> >
> >>
> >> No problem. Let me know if it does not work as expected.
> >>
> >> Sean
> >>
> >>
> >>
> >> >
> >> > Cheers,
> >> > Mike
> >> >
> >> > On 30 November 2017 at 11:20, Sean Davis <seandavi at gmail.com> wrote:
> >> >
> >> >> Thanks for the report, Mike.
> >> >>
> >> >> The problem was (specifically) in parsing a GSEMatrix file using a
> >> >> filename. This should be fixed in versions 2.46.10 (release) and
> 2.47.12
> >> >> (devel).
> >> >>
> >> >> Sean
> >> >>
> >> >>
> >> >> On Thu, Nov 30, 2017 at 4:09 AM, Mike Smith <grimbough at gmail.com>
> wrote:
> >> >>
> >> >>> Hi Mike,
> >> >>>
> >> >>> I was experiencing similar problems with the BeadArrayUseCases
> vignette,
> >> >>> where using getGEO() from GEOquery was getting stuck in a
> (seemingly)
> >> >>> infinite loop processing a GSE series matrix file.  It looks like
> both of
> >> >>> your examples try to do this too, so I suspect it's a similar
> issue.  I
> >> >>> think the format of those files has changed recently and it seems
> to be
> >> >>> causing a fair few issues with GEOquery.
> >> >>>
> >> >>> I temporarily settled a solution by getting querying GEO directly
> rather
> >> >>> than using a local file, but it would be nice to get it back
> working as
> >> >>> intended.
> >> >>>
> >> >>> Mike
> >> >>>
> >> >>> On 29 November 2017 at 18:56, Michael Love <
> michaelisaiahlove at gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>> > I got simultaneous timeout notices for 'airway' and
> 'parathyroidSE' on
> >> >>> > both release and devel machines (release was fine leading up to
> the
> >> >>> > Bioc release).
> >> >>> >
> >> >>> > Not sure what's the issue, I haven't changed these packages in a
> >> >>> > while. I checked these out and these both build fine and in ~30s
> on my
> >> >>> > machine (devel branch).
> >> >>> >
> >> >>> > Here are the reports for release:
> >> >>> >
> >> >>> > http://bioconductor.org/checkResults/release/data-
> >> >>> > experiment-LATEST/airway/malbec1-buildsrc.html
> >> >>> > http://bioconductor.org/checkResults/release/data-
> experiment-LATEST/
> >> >>> > parathyroidSE/malbec1-buildsrc.html
> >> >>> >
> >> >>> > The vignettes are here:
> >> >>> >
> >> >>> > http://bioconductor.org/packages/3.6/data/experiment/
> >> >>> > vignettes/airway/inst/doc/airway.html
> >> >>> > http://bioconductor.org/packages/3.6/data/experiment/
> >> >>> > vignettes/parathyroidSE/inst/doc/parathyroidSE.pdf
> >> >>> >
> >> >>> > best,
> >> >>> > Mike
> >> >>> >
> >> >>> > _______________________________________________
> >> >>> > Bioc-devel at r-project.org mailing list
> >> >>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >> >>> >
> >> >>>
> >> >>>         [[alternative HTML version deleted]]
> >> >>>
> >> >>> _______________________________________________
> >> >>> Bioc-devel at r-project.org mailing list
> >> >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Sean Davis, MD, PhD
> >> >> Center for Cancer Research
> >> >> National Cancer Institute
> >> >> National Institutes of Health
> >> >> Bethesda, MD 20892
> >> >> https://seandavi.github.io/
> >> >> https://twitter.com/seandavis12
> >> >>
> >> >
> >> >
> >>
> >>
> >> --
> >> Sean Davis, MD, PhD
> >> Center for Cancer Research
> >> National Cancer Institute
> >> National Institutes of Health
> >> Bethesda, MD 20892
> >> https://seandavi.github.io/
> >> https://twitter.com/seandavis12
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
>



-- 
Sean Davis, MD, PhD
Center for Cancer Research
National Cancer Institute
National Institutes of Health
Bethesda, MD 20892
https://seandavi.github.io/
https://twitter.com/seandavis12

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list