[Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed

Gabe Becker becker.gabe at gene.com
Fri Jan 19 18:37:30 CET 2018


IT seems like you could also force a copy of the reference object via
<dbobject>$copy() and then force a refresh of the conn slot by assigning a
new db connection into it.

I'm having trouble confirming that this would work, however, because I
actually can't reproduce the error. The naive way works for me on my mac
laptop (which is running an old R and Bioconductor) and on the linux
cluster I have access to (running Bioc 3.6):


(cluster)

> getSymbol <- function ( x ) {

+ return( AnnotationDbi::mget( x , hgu95av2SYMBOL ) )

+ }

>

> x <- list( "36090_at" , "38785_at" )

>

> mclapply( x , getSymbol )

[[1]]

[[1]]$`36090_at`

[1] "TBL2"



[[2]]

[[2]]$`38785_at`

[1] "MUC1"



>

> sessionInfo()

R version 3.4.3 (2017-11-30)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)


Matrix products: default

BLAS:
/gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRblas.so

LAPACK:
/gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRlapack.so


locale:

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C

 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8

 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8

 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C

 [9] LC_ADDRESS=C               LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C


attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets

[8] methods   base


other attached packages:

[1] hgu95av2.db_3.2.3    org.Hs.eg.db_3.5.0   AnnotationDbi_1.40.0

[4] IRanges_2.12.0       S4Vectors_0.16.0     Biobase_2.38.0

[7] BiocGenerics_0.24.0


loaded via a namespace (and not attached):

 [1] Rcpp_0.12.14    digest_0.6.14   DBI_0.7         RSQLite_2.0

 [5] pillar_1.1.0    rlang_0.1.6     blob_1.1.0      bit64_0.9-8

 [9] bit_1.1-13      compiler_3.4.3  pkgconfig_2.0.1 memoise_1.1.0

[13] tibble_1.4.1

>


~G

On Fri, Jan 19, 2018 at 9:23 AM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:

> good question
>
> some of the discussion on
>
> http://sqlite.1065341.n5.nabble.com/Parallel-access-to-
> read-only-in-memory-database-td91814.html
>
> seems relevant.
>
> converting the relatively small annotation package content to pure R
> read-only tables on the master before parallelizing
> might be very simple?
>
> On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger <
> Ludwig.Geistlinger at sph.cuny.edu> wrote:
>
> > Hi,
> >
> > Within a package I am developing, I would like to enable parallel probe
> to
> > gene mapping for a compendium of microarray datasets.
> >
> > This accordingly makes use of annotation packages such as hgu133a.db,
> > which in turn connect to the SQLite database via AnnotationDbi.
> >
> > When running in multi-core mode (i.e. using a MulticoreParam with
> > BiocParallel) using more than 2 cores, this causes the error:
> >
> > database disk image is malformed
> >
> >
> > In a very similar problem:
> >
> > https://support.bioconductor.org/p/38541/
> >
> > Adi Tarca and Dan Tenenbaum identified and resolved this problem by
> > ensuring that each process has its own unique database connection, i.e.
> > AnnotationDbi is not loaded before sending the job to the workers.
> >
> > This solution was easily realized as this analysis was carried out within
> > a script and not a package.
> >
> > However, within my package, AnnotationDbi is loaded as a dependency of my
> > package's imports.
> >
> > How to resolve this here?
> > I am not sure whether I perfectly understand the underlying mechanisms,
> > but is there a way to make my workers load their own version of
> > AnnotationDbi instead of using the one of the parent process?
> > Or am I supposed to unload all packages depending on AnnotationDbi, and
> > AnnotationDbi itself, before sending the job to the workers (and reload
> all
> > of them after the job has finished?)
> >
> > Thanks a lot,
> > Ludwig
> >
> >
> >
> > --
> > Dr. Ludwig Geistlinger
> > CUNY School of Public Health
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list