[Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed

Martin Morgan martin.morgan at roswellpark.org
Fri Jan 19 19:03:45 CET 2018


On 01/19/2018 12:23 PM, Vincent Carey wrote:
> good question
> 
> some of the discussion on
> 
> http://sqlite.1065341.n5.nabble.com/Parallel-access-to-read-only-in-memory-database-td91814.html
> 
> seems relevant.
> 
> converting the relatively small annotation package content to pure R
> read-only tables on the master before parallelizing
> might be very simple?
> 
> On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger <
> Ludwig.Geistlinger at sph.cuny.edu> wrote:
> 
>> Hi,
>>
>> Within a package I am developing, I would like to enable parallel probe to
>> gene mapping for a compendium of microarray datasets.
>>
>> This accordingly makes use of annotation packages such as hgu133a.db,
>> which in turn connect to the SQLite database via AnnotationDbi.
>>
>> When running in multi-core mode (i.e. using a MulticoreParam with
>> BiocParallel) using more than 2 cores, this causes the error:
>>
>> database disk image is malformed
>>
>>
>> In a very similar problem:
>>
>> https://support.bioconductor.org/p/38541/
>>
>> Adi Tarca and Dan Tenenbaum identified and resolved this problem by
>> ensuring that each process has its own unique database connection, i.e.
>> AnnotationDbi is not loaded before sending the job to the workers.
>>
>> This solution was easily realized as this analysis was carried out within
>> a script and not a package.
>>
>> However, within my package, AnnotationDbi is loaded as a dependency of my
>> package's imports.
>>
>> How to resolve this here?

Can you be a little more specific here? The problem isn't likely with 
AnnotationDbi per se, but with the annotation package you use. Also, the 
connection on the worker is bad, but could be re-created (using, e.g., 
dbfile(org.Hs.eg.db)...) but probably a toy example would help.

Martin

>> I am not sure whether I perfectly understand the underlying mechanisms,
>> but is there a way to make my workers load their own version of
>> AnnotationDbi instead of using the one of the parent process?
>> Or am I supposed to unload all packages depending on AnnotationDbi, and
>> AnnotationDbi itself, before sending the job to the workers (and reload all
>> of them after the job has finished?)
>>
>> Thanks a lot,
>> Ludwig
>>
>>
>>
>> --
>> Dr. Ludwig Geistlinger
>> CUNY School of Public Health
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 


This email message may contain legally privileged and/or...{{dropped:2}}



More information about the Bioc-devel mailing list