[Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed
Gabe Becker
becker.gabe at gene.com
Fri Jan 19 18:37:30 CET 2018
IT seems like you could also force a copy of the reference object via
<dbobject>$copy() and then force a refresh of the conn slot by assigning a
new db connection into it.
I'm having trouble confirming that this would work, however, because I
actually can't reproduce the error. The naive way works for me on my mac
laptop (which is running an old R and Bioconductor) and on the linux
cluster I have access to (running Bioc 3.6):
(cluster)
> getSymbol <- function ( x ) {
+ return( AnnotationDbi::mget( x , hgu95av2SYMBOL ) )
+ }
>
> x <- list( "36090_at" , "38785_at" )
>
> mclapply( x , getSymbol )
[[1]]
[[1]]$`36090_at`
[1] "TBL2"
[[2]]
[[2]]$`38785_at`
[1] "MUC1"
>
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)
Matrix products: default
BLAS:
/gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRblas.so
LAPACK:
/gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] hgu95av2.db_3.2.3 org.Hs.eg.db_3.5.0 AnnotationDbi_1.40.0
[4] IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0
[7] BiocGenerics_0.24.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 digest_0.6.14 DBI_0.7 RSQLite_2.0
[5] pillar_1.1.0 rlang_0.1.6 blob_1.1.0 bit64_0.9-8
[9] bit_1.1-13 compiler_3.4.3 pkgconfig_2.0.1 memoise_1.1.0
[13] tibble_1.4.1
>
~G
On Fri, Jan 19, 2018 at 9:23 AM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:
> good question
>
> some of the discussion on
>
> http://sqlite.1065341.n5.nabble.com/Parallel-access-to-
> read-only-in-memory-database-td91814.html
>
> seems relevant.
>
> converting the relatively small annotation package content to pure R
> read-only tables on the master before parallelizing
> might be very simple?
>
> On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger <
> Ludwig.Geistlinger at sph.cuny.edu> wrote:
>
> > Hi,
> >
> > Within a package I am developing, I would like to enable parallel probe
> to
> > gene mapping for a compendium of microarray datasets.
> >
> > This accordingly makes use of annotation packages such as hgu133a.db,
> > which in turn connect to the SQLite database via AnnotationDbi.
> >
> > When running in multi-core mode (i.e. using a MulticoreParam with
> > BiocParallel) using more than 2 cores, this causes the error:
> >
> > database disk image is malformed
> >
> >
> > In a very similar problem:
> >
> > https://support.bioconductor.org/p/38541/
> >
> > Adi Tarca and Dan Tenenbaum identified and resolved this problem by
> > ensuring that each process has its own unique database connection, i.e.
> > AnnotationDbi is not loaded before sending the job to the workers.
> >
> > This solution was easily realized as this analysis was carried out within
> > a script and not a package.
> >
> > However, within my package, AnnotationDbi is loaded as a dependency of my
> > package's imports.
> >
> > How to resolve this here?
> > I am not sure whether I perfectly understand the underlying mechanisms,
> > but is there a way to make my workers load their own version of
> > AnnotationDbi instead of using the one of the parent process?
> > Or am I supposed to unload all packages depending on AnnotationDbi, and
> > AnnotationDbi itself, before sending the job to the workers (and reload
> all
> > of them after the job has finished?)
> >
> > Thanks a lot,
> > Ludwig
> >
> >
> >
> > --
> > Dr. Ludwig Geistlinger
> > CUNY School of Public Health
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
--
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list