[BioC] Downloading data only once when building many annotation packages with AnnBuilder

David Weiss David.Weiss at ulb.ac.be
Tue Oct 24 18:02:33 CEST 2006


Hello,

I am using a dual processor Power Mac G5:
System Version:	Mac OS X 10.3.9 (7W98)
Kernel Version:	Darwin 7.9.0


I work with a series of custom cDNA microarray platforms (~10).  I
annotated them using ABPkgBuilder from the AnnBuilder package. This
has worked nicely. The packages built under bioconductor 1.8 (for R 2.2 
)
can be installed on the the newer bioconductor 1.9 (for R 2.4) with the
shell command, to install, e.g., the package "foo":


$ sudo R CMD INSTALL foo
Password:

* Installing *source* package 'foo' ...
** R
** data
**  moving datasets to lazyload DB
** help
  >>> Building/Updating help pages for package 'foo'
      Formats: text html latex example
** building package indices ...
* DONE (foo)

However when I try to use it, lookUp returns an error (and exerpt
of the traceback):

Error in `parent.env<-`(`*tmp*`, value = NULL) :
	use of NULL environment is defunct

 > traceback()
13: `parent.env<-`(`*tmp*`, value = NULL)
12: function (n)
     {
         if (existsInFrame(n, envenv))
             getFromFrame(n, envenv)
         else {
             e <- mkenv()
             set(n, e, envenv)
             key <- getFromFrame(n, env)
             data <- lazyLoadDBfetch(key, datafile, compressed, envhook)
             parent.env(e) <- data$enclos
             vars <- names(data$bindings)
             for (i in along(vars)) set(vars[i], data$bindings[[i]],
                 e)
             e
         }
     }("env::1")
11: .Call("R_lazyLoadDBfetch", key, file, compressed, hook, PACKAGE = 
"base")
10: lazyLoadDBfetch(key, datafile, compressed, envhook)
9: get(paste(data, what, sep = ""))
8: mget(x, env = get(paste(data, what, sep = "")), ifnotfound = NA)
7: lookUp(keys, "foo", x)
...


I am thus rebuilding the packages anew. The problem is that
downloading the public databases (UniGene, etc...) takes a substantial
amount of time. In addition, in my system, the temporarty folders are
automatically and periodically removed along with the downloaded
files, which disrupts the process when it takes very long. The 
downloaded
files are named automatically [with tempfile()] and put in the 
per-session
temporary directory:


$ ls -l /private/tmp/RtmptH8Ikl/
total 1168204
-rw-r--r--   1 David    wheel    684448428 Oct 24 12:31 
file1c06dac8Hs.data
-rw-r--r--   1 David    wheel    500952812 Oct 24 12:50 
file6058ed8gene2accession
-rw-r--r--   1 David    wheel     2151508 Oct 24 11:57 tempFile10d63af1
-rw-r--r--   1 David    wheel     2151508 Oct 24 11:59 tempFile3ab50c2a
-rw-r--r--   1 David    wheel     2234013 Oct 24 15:44 tempFile427c3c55
-rw-r--r--   1 David    wheel     2151508 Oct 24 12:01 tempFile4431b782
-rw-r--r--   1 David    wheel     2151508 Oct 24 11:58 tempFile60b7acd9


So I can't easily keep track of them. I am looking for a way to
download the files only once to a local, non-temporary directory, and
to retrieve or choose their names. Then, I want to build the first
package and reuse the downloaded files for the remaining
packages. Once I have achieved this, I can use the procedure to update
my annotation packages, for example, monthly. I know that ABPkgBuilder
has a "fromWeb" argument to allow for local source files but I can't
figure out how to pass the location or names of these files to the
function. If I knew I'd still have to figure out how to retrieve the 
file names
and locations in the first place. If somebody has already solved this 
problem
  it would save me a lot of time to know about it. Else, I will post the 
solution
when/if I get to it.

Thanks a lot,

David Weiss Solis

---------------------------------------------------------
p.s: session information :
 > sessionInfo()
R version 2.4.0 (2006-10-03)
powerpc-apple-darwin7.9.0

locale:
C

attached base packages:
[1] "tools"     "methods"   "stats"     "graphics"  "grDevices" "utils"
[7] "datasets"  "base"

other attached packages:
AnnBuilder    RSQLite        DBI   annotate        XML    Biobase     
marray
   "1.12.0"    "0.4-4"   "0.1-10"   "1.12.0"   "0.97-8"   "1.12.1"   
"1.12.0"
      limma	foo
    "2.9.1"    "1.1.0"

-----------------------------------------------------------
Ir. David Weiss Solis
IRIBHM
Bldg C, room C.4.116 
ULB, Campus Erasme, CP602 
808 route de Lennik 
B-1070 Brussels 
Belgium

Phone: +32-2-555 4220 
Fax: +32-2-555 4655

E-mail: dweiss at ulb.ac.be

URL: http://homepages.ulb.ac.be/~dweiss/



More information about the Bioconductor mailing list