[Bioc-devel] is it normal for makeDBPackage to take a VERY long time?
Hervé Pagès
hpages at fhcrc.org
Sat Jan 15 02:50:53 CET 2011
Hi Tim,
FWIW also make sure you use the latest version of RSQLite (currently
0.9-4). The SQLite engine shipped with RSQLite was updated a couple
of months ago and seems to be significantly faster than the previous
versions in some situations.
Cheers,
H.
On 01/13/2011 12:20 PM, Marc Carlson wrote:
> Hi Tim,
>
> It certainly is doing something. There are a couple of different things
> that can cause slowness with this. The 1st is that SQLForge is doing a
> lot of busy work so that the packages it produces can "know" how many
> mappings are expected from each (there is a table called map_counts that
> requires this information). The second problem is that you are mapping
> over an order of magnitude more probes than we usually would ever need
> to do. In the past, performance has not been a big problem, in part
> because you only need to do this step once and in part because normally
> people only want to map a few tens of thousands of probes. But you seem
> to be pushing the envelope pretty hard and the wait time has become
> pretty extreme as a result. My guess is that I need to add some
> temporary indices into the initial mapping process to speed up this step
> when there are a lot of probes. I will take a look and see what I can do.
>
> Marc
>
>
>
> On 01/13/2011 09:45 AM, Tim Triche, Jr. wrote:
>> Hi Sean, (and others)
>>
>> Does it usually take an obscenely long time for makeDBPackage to run when
>> given a bunch of refseq IDs? I ran the following:
>>
>>
>>> makeDBPackage("HUMANCHIP_DB", affy=FALSE,
>>>
>> prefix="IlluminaHumanMethylation450k",fileName="acc450k.txt",baseMapType='refseq',version='1.0.0',manufacturer='Illumina',chipName='Human
>> Methylation 450k', manufacturerUrl='http://illumina.com/')
>> baseMapType is refseq # time passes...
>>
>> and it's doing SOMETHING, because the table temp_probe_map in the sqlite
>> file it created is filling up. But it's been grinding away at this for the
>> past 12 hours, which seems a bit excessive for mapping 806,334 refseq
>> accessions. Really, all I want is for my bimap objects to work as expected,
>> the annotation integration is just gravy.
>>
>> The idea was to release updated 27k and completed 450k packages to handle
>> the IDAT mappings, immediately followed by a methylumIDAT release, and start
>> merging that and my preprocessing stuff into the [methy]lumi toolchain. The
>> 450k probes are split between two designs, so I kind of had to bite the
>> bullet and roll my own schema to do the mappings efficiently, plus between
>> 450k and FFPE samples there has been a LOT of weirdness lately that I'd like
>> not to depend on Illumina's software to handle. So...
>>
>> I started out doing this on my laptop (gave up after a few hours and moved
>> it to the server):
>>
>>
>>> sessionInfo()
>>>
>> R version 2.13.0 Under development (unstable) (2010-12-21 r53879)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] grid stats graphics grDevices datasets utils methods
>> [8] base
>>
>> other attached packages:
>> [1] human.db0_2.4.1 ChIPpeakAnno_1.7.0
>>
>> [3] limma_3.5.20 GO.db_2.4.1
>>
>> [5] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.19.2
>>
>> [7] GenomicRanges_1.3.7 Biostrings_2.19.2
>>
>> [9] IRanges_1.9.17 multtest_2.6.0
>>
>> [11] biomaRt_2.7.1 groupedPMA_0.2
>>
>> [13] betareg_2.2-3 Formula_1.0-0
>>
>> [15] PMA_1.0.7 huge_0.9
>>
>> [17] MASS_7.3-9 igraph_0.5.5-1
>>
>> [19] glasso_1.4 glmnet_1.5.1
>>
>> [21] Matrix_0.999375-46 lattice_0.19-17
>>
>> [23] grplasso_0.4-2 impute_1.25.0
>>
>> [25] rGammaGamma_1.0 methylumIDAT_0.1
>>
>> [27] IlluminaHumanMethylation27k.db_1.4.0 org.Hs.eg.db_2.4.6
>>
>> [29] RSQLite_0.9-4 DBI_0.2-5
>>
>> [31] AnnotationDbi_1.13.0 ggplot2_0.8.9
>>
>> [33] proto_0.3-8 lumi_2.3.5
>>
>> [35] nleqslv_1.8 matrixStats_0.2.2
>>
>> [37] R.methodsS3_1.2.1 gsl_1.9-8
>>
>> [39] methylumi_1.3.3 Biobase_2.11.7
>>
>> [41] gtools_2.6.2 reshape_0.8.3
>>
>> [43] plyr_1.4
>>
>> loaded via a namespace (and not attached):
>> [1] affy_1.27.2 affyio_1.17.4 annotate_1.27.1
>> [4] digest_0.4.2 hdrcde_2.15 KernSmooth_2.23-4
>> [7] lmtest_0.9-27 mgcv_1.7-2 nlme_3.1-97
>> [10] preprocessCore_1.11.0 RCurl_1.5-0 sandwich_2.2-6
>> [13] splines_2.13.0 survival_2.36-2 tools_2.13.0
>> [16] XML_3.2-0 xtable_1.5-6
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list