[Bioc-devel] is it normal for makeDBPackage to take a VERY long time?

Thu Jan 13 21:20:07 CET 2011

Hi Tim,

It certainly is doing something.  There are a couple of different things
that can cause slowness with this.  The 1st is that SQLForge is doing a
lot of busy work so that the packages it produces can "know" how many
mappings are expected from each (there is a table called map_counts that
requires this information).  The second problem is that you are mapping
over an order of magnitude more probes than we usually would ever need
to do.  In the past, performance has not been a big problem, in part
because you only need to do this step once and in part because normally
people only want to map a few tens of thousands of probes.  But you seem
to be pushing the envelope pretty hard and the wait time has become
pretty extreme as a result.  My guess is that I need to add some
temporary indices into the initial mapping process to speed up this step
when there are a lot of probes.  I will take a look and see what I can do. 

  Marc

On 01/13/2011 09:45 AM, Tim Triche, Jr. wrote:
> Hi Sean, (and others)
>
> Does it usually take an obscenely long time for makeDBPackage to run when
> given a bunch of refseq IDs?  I ran the following:
>
>   
>> makeDBPackage("HUMANCHIP_DB", affy=FALSE,
>>     
> prefix="IlluminaHumanMethylation450k",fileName="acc450k.txt",baseMapType='refseq',version='1.0.0',manufacturer='Illumina',chipName='Human
> Methylation 450k', manufacturerUrl='http://illumina.com/')
> baseMapType is refseq # time passes...
>
> and it's doing SOMETHING, because the table temp_probe_map in the sqlite
> file it created is filling up.  But it's been grinding away at this for the
> past 12 hours, which seems a bit excessive for mapping 806,334 refseq
> accessions.  Really, all I want is for my bimap objects to work as expected,
> the annotation integration is just gravy.
>
> The idea was to release updated 27k and completed 450k packages to handle
> the IDAT mappings, immediately followed by a methylumIDAT release, and start
> merging that and my preprocessing stuff into the [methy]lumi toolchain.  The
> 450k probes are split between two designs, so I kind of had to bite the
> bullet and roll my own schema to do the mappings efficiently, plus between
> 450k and FFPE samples there has been a LOT of weirdness lately that I'd like
> not to depend on Illumina's software to handle.  So...
>
> I started out doing this on my laptop (gave up after a few hours and moved
> it to the server):
>
>   
>> sessionInfo()
>>     
> R version 2.13.0 Under development (unstable) (2010-12-21 r53879)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] grid      stats     graphics  grDevices datasets  utils     methods
> [8] base
>
> other attached packages:
>  [1] human.db0_2.4.1                      ChIPpeakAnno_1.7.0
>
>  [3] limma_3.5.20                         GO.db_2.4.1
>
>  [5] BSgenome.Ecoli.NCBI.20080805_1.3.16  BSgenome_1.19.2
>
>  [7] GenomicRanges_1.3.7                  Biostrings_2.19.2
>
>  [9] IRanges_1.9.17                       multtest_2.6.0
>
> [11] biomaRt_2.7.1                        groupedPMA_0.2
>
> [13] betareg_2.2-3                        Formula_1.0-0
>
> [15] PMA_1.0.7                            huge_0.9
>
> [17] MASS_7.3-9                           igraph_0.5.5-1
>
> [19] glasso_1.4                           glmnet_1.5.1
>
> [21] Matrix_0.999375-46                   lattice_0.19-17
>
> [23] grplasso_0.4-2                       impute_1.25.0
>
> [25] rGammaGamma_1.0                      methylumIDAT_0.1
>
> [27] IlluminaHumanMethylation27k.db_1.4.0 org.Hs.eg.db_2.4.6
>
> [29] RSQLite_0.9-4                        DBI_0.2-5
>
> [31] AnnotationDbi_1.13.0                 ggplot2_0.8.9
>
> [33] proto_0.3-8                          lumi_2.3.5
>
> [35] nleqslv_1.8                          matrixStats_0.2.2
>
> [37] R.methodsS3_1.2.1                    gsl_1.9-8
>
> [39] methylumi_1.3.3                      Biobase_2.11.7
>
> [41] gtools_2.6.2                         reshape_0.8.3
>
> [43] plyr_1.4
>
> loaded via a namespace (and not attached):
>  [1] affy_1.27.2           affyio_1.17.4         annotate_1.27.1
>  [4] digest_0.4.2          hdrcde_2.15           KernSmooth_2.23-4
>  [7] lmtest_0.9-27         mgcv_1.7-2            nlme_3.1-97
> [10] preprocessCore_1.11.0 RCurl_1.5-0           sandwich_2.2-6
> [13] splines_2.13.0        survival_2.36-2       tools_2.13.0
> [16] XML_3.2-0             xtable_1.5-6
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>