[BioC] GOstats locked database
Janet Young
jayoung at fhcrc.org
Wed Jan 23 23:29:36 CET 2008
Hi all,
I'm having some trouble with a locked database with GOstats, perhaps
due to running multiple simultaneous processes that are all accessing
GO.db?
I'm using R CMD BATCH to run an R script I wrote, and I'm doing that
simultaneously from 12 different terminal windows, each logged in to
a single node of a linux cluster. Some processes may be sharing a
node (2 CPU per node). I'm happy to send the entire script, if that's
useful, but for now there are just some snippets. Here's the basic
problem:
> params <- new("GOHyperGParams", geneIds = geneentrezIDs,
universeGeneIds = allgeneentrezIDs, ontology="BP",
annotation="org.Hs.eg.db",pvalueCutoff=hgCutoff, conditional=FALSE,
testDirection = "over")
> thishgOver<-hyperGTest(params)
Error in sqliteFetch(rs, n = -1, ...) :
RSQLite driver: (RS_SQLite_fetch: failed first step: database is
locked)
Calls: hyperGTest ... dbGetQuery -> sqliteQuickSQL -> sqliteFetch -
> .Call
Execution halted
It's a very sporadic problem - I'm actually using the script to loop
through a bunch of simulated datasets and run hyperGTest - it does
fine for a while and then suddenly has a problem. I can't be sure,
but it seems like several of the processes I was running
simultaneously all had a problem around the same time (which wouldn't
be surprising if something suddenly happened to the database).
It's also possible that our linux nodes are having some intermittent
connectivity issues to the mounted drives - could that cause the
database locked error? If so would there be a way to make hyperGTest
robust to a temporary problem like that?
As well as hyperGTest, the script also accesses GO information using
the following commands at various points, with commands like these:
> Term(get(names(genes)[b],GOTERM))
> geneentrezIDs <- geneentrezIDs[!is.na(mget
(geneentrezIDs,envir=org.Hs.egGO,ifnotfound=NA))]
I was running a very similar version of the script last week, with no
problem, and I think the above two commands are the only things I've
added that might be accessing the GO data. I'm not clear on which of
these commands use the same database as one another: (a) mget from
org.Hs.egGO (b) hyperGTest with annotation="org.Hs.eg.db", (c) get
from GOTERM.
Here is the output of sessionInfo(), run just before I started
looping through the datasets, so several iterations of the mget from
org.Hs.egGO and the hyperGTest have happened after running this
sessionInfo, but I think all relevant libraries were loaded. (is
there a way to make R output sessionInfo immediately before it
terminates with error, when running in batch mode?)
> sessionInfo()
R version 2.6.1 Patched (2007-12-02 r43572)
i686-pc-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U
TF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-
8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_ID
ENTIFICATION=C
attached base packages:
[1] splines tools stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] org.Hs.eg.db_2.0.2 GOstats_2.4.0 Category_2.4.0
[4] genefilter_1.16.0 survival_2.34 RBGL_1.14.0
[7] annotate_1.16.1 xtable_1.5-2 GO.db_2.0.2
[10] AnnotationDbi_1.0.6 RSQLite_0.6-4 DBI_0.2-4
[13] Biobase_1.16.2 graph_1.16.1
loaded via a namespace (and not attached):
[1] cluster_1.11.9
And here's some other, possibly pertinent information:
[12] kpvpt50:/home/jayoung/traskdata/janet/forOthers/forIlona/
GOanalysis/doGOmoreregions_slightly_better_again/DCLoss_10percent>
ls -l ~/traskdata/lib_linux/R/library/GO.db/extdata/
total 37364
-rw-r--r-- 1 jayoung trasklab 38252544 Dec 3 13:55 GO.sqlite
So I can write to GO.sqlite. Should it be read-only, to myself? Will
that mess me up if I want to over-write it in future?
[93] bedrock:/home/jayoung/traskdata/janet/forOthers/forIlona/
GOanalysis/doGOmoreregions_slightly_better_again> ls -l ~/traskdata/
lib_linux/R/library/org.Hs.eg.db/extdata/
total 187130
-rw-r--r-- 1 jayoung trasklab 95802368 Dec 13 14:50 org.Hs.eg.sqlite
Thanks for any advice - this is a tricky one as it happens sometime
in the middle of a ~12 hour run, and is not necessarily reproducible.
Hopefully I've provided enough information here to track down the
problem.
Janet
-------------------------------------------------------------------
Dr. Janet Young (Trask lab)
Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168,
P.O. Box 19024, Seattle, WA 98109-1024, USA.
tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung at fhcrc.org
http://www.fhcrc.org/labs/trask/
More information about the Bioconductor
mailing list