[BioC] multicore Vignette or HowTo??
Martin Morgan
mtmorgan at fhcrc.org
Mon Oct 18 18:39:22 CEST 2010
On 10/18/2010 09:05 AM, Edwin Groot wrote:
> Hello all,
> I have difficulty getting the multicore package doing what it promises.
> Does anybody have a benchmark that demonstrates something intensive
> with and without multicore assistance?
> I have a dual dual-core Xeon, and $ top tells me all R can squeeze from
> my Linux system is 25% us. Here is my example:
>
>> library(Starr)
> #Read in a set of ChIP-chip arrays
>> read("array.rda")
> # $ top reports 25% us for the following:
>> array_norm <- normalize.Probes(array, method = "loess")
> #Try the same with multicore
>> library(multicore)
>> multicore:::detectCores()
> [1] 4
> #No benefit from multicore. $ top reports 25% us for the following:
>> array_norm <- normalize.Probes(array, method = "loess")
> #lattice masks out parallel from multicore. Use mcparallel instead.
>> pnorm <- mcparallel(normalize.Probes(array, method = "loess"))
Here's my favorite test of parallel functionality
> library(multicore)
> system.time(lapply(1:4, function(i) Sys.sleep(1)))
user system elapsed
0.001 0.000 4.004
> system.time(mclapply(1:4, function(i) Sys.sleep(1)))
user system elapsed
0.007 0.005 1.009
time goes 4x faster!
Code has to be multicore-aware, and saying something like
pnorm <- mcparallel(normalize.Probes(array, method = "loess"))
array_norm <- collect(pnorm)
just says to fork a process to do the task, not to do the task in
parallel (multicore doesn't do anything clever, like identify parts of
the code that could be parallelized). The Starr author would have to
implement normalize.Probes to take advantage of multiple cores, or your
own task would have to be parallelizable at the 'user' level, like an
lapply.
I'm really not sure why array_norm is NULL. after looking at the example
on ?normalize.Probes I did
dataPath <- system.file("extdata", package="Starr")
bpmapChr1 <- readBpmap(file.path(dataPath,
"Scerevisiae_tlg_chr1.bpmap"))
cels <- c(file.path(dataPath,"Rpb3_IP_chr1.cel"),
file.path(dataPath,"wt_IP_chr1.cel"),
file.path(dataPath,"Rpb3_IP2_chr1.cel"))
names <- c("rpb3_1", "wt_1","rpb3_2")
type <- c("IP", "CONTROL", "IP")
rpb3Chr1 <- readCelFile(bpmapChr1, cels, names, type,
featureData=TRUE, log.it=TRUE)
and then (not expecting to see any speed improvement, for the reason
outlined above)
> job <- mcparallel(normalize.Probes(rpb3Chr1,method="rankpercentile"))
> job
parallelJob: processID=12120
> collect(job)
$`12120`
ExpressionSet (storageMode: lockedEnvironment)
assayData: 20000 features, 3 samples
element names: exprs
protocolData: none
phenoData
sampleNames: rpb3_1 wt_1 rpb3_2
varLabels: type CEL
varMetadata: labelDescription
featureData
featureNames: 1 2 ... 20000 (20000 total)
fvarLabels: chr seq pos
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:
Martin
>> Normalizing probes with method: loess
> Done with 1 vs 2 in iteration 1
> #Function continues for some time and displays more messages. No
> benefit from multicore. $ top reports 25% us during the run...
>> array_norm <- collect(pnorm)
> #Oh dear, where did my normalized data go?
>> array_norm
> $`4037`
> NULL
>> sessionInfo()
> R version 2.11.1 (2010-05-31)
> x86_64-pc-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8
> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] tools grid stats graphics grDevices utils
> datasets
> [8] methods base
>
> other attached packages:
> [1] geneplotter_1.26.0 annotate_1.26.1 AnnotationDbi_1.10.2
> [4] Starr_1.4.4 affxparser_1.20.0 affy_1.26.1
> [7] Ringo_1.12.0 Matrix_0.999375-39 lattice_0.18-8
> [10] limma_3.4.4 RColorBrewer_1.0-2 Biobase_2.8.0
> [13] multicore_0.1-3
>
> loaded via a namespace (and not attached):
> [1] affyio_1.16.0 DBI_0.2-5 genefilter_1.30.0
> [4] MASS_7.3-6 preprocessCore_1.10.0 pspline_1.0-14
> [7] RSQLite_0.9-2 splines_2.11.1 survival_2.35-8
> [10] tcltk_2.11.1 xtable_1.5-6
>
> RTFMing only gives me the syntax of some functions in the multicore
> package. How do I apply successfully this thing to my code?
>
> Regards,
> Edwin
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list