[BioC] multicore Vignette or HowTo??

Martin Morgan mtmorgan at fhcrc.org
Mon Oct 18 18:39:22 CEST 2010


On 10/18/2010 09:05 AM, Edwin Groot wrote:
> Hello all,
> I have difficulty getting the multicore package doing what it promises.
> Does anybody have a benchmark that demonstrates something intensive
> with and without multicore assistance?
> I have a dual dual-core Xeon, and $ top tells me all R can squeeze from
> my Linux system is 25% us. Here is my example:
> 
>> library(Starr)
> #Read in a set of ChIP-chip arrays
>> read("array.rda")
> # $ top reports 25% us for the following:
>> array_norm <- normalize.Probes(array, method = "loess")
> #Try the same with multicore
>> library(multicore)
>> multicore:::detectCores()
> [1] 4
> #No benefit from multicore. $ top reports 25% us for the following:
>> array_norm <- normalize.Probes(array, method = "loess")
> #lattice masks out parallel from multicore. Use mcparallel instead.
>> pnorm <- mcparallel(normalize.Probes(array, method = "loess"))

Here's my favorite test of parallel functionality

> library(multicore)
> system.time(lapply(1:4, function(i) Sys.sleep(1)))
   user  system elapsed
  0.001   0.000   4.004
> system.time(mclapply(1:4, function(i) Sys.sleep(1)))
   user  system elapsed
  0.007   0.005   1.009

time goes 4x faster!

Code has to be multicore-aware, and saying something like

    pnorm <- mcparallel(normalize.Probes(array, method = "loess"))
    array_norm <- collect(pnorm)

just says to fork a process to do the task, not to do the task in
parallel (multicore doesn't do anything clever, like identify parts of
the code that could be parallelized). The Starr author would have to
implement normalize.Probes to take advantage of multiple cores, or your
own task would have to be parallelizable at the 'user' level, like an
lapply.

I'm really not sure why array_norm is NULL. after looking at the example
on ?normalize.Probes I did

 dataPath <- system.file("extdata", package="Starr")
 bpmapChr1 <- readBpmap(file.path(dataPath,
    "Scerevisiae_tlg_chr1.bpmap"))
  cels <- c(file.path(dataPath,"Rpb3_IP_chr1.cel"),
    file.path(dataPath,"wt_IP_chr1.cel"),
    file.path(dataPath,"Rpb3_IP2_chr1.cel"))
  names <- c("rpb3_1", "wt_1","rpb3_2")
  type <- c("IP", "CONTROL", "IP")
  rpb3Chr1 <- readCelFile(bpmapChr1, cels, names, type,
     featureData=TRUE, log.it=TRUE)


and then (not expecting to see any speed improvement, for the reason
outlined above)

> job <- mcparallel(normalize.Probes(rpb3Chr1,method="rankpercentile"))
> job
 parallelJob: processID=12120
> collect(job)
$`12120`
ExpressionSet (storageMode: lockedEnvironment)
assayData: 20000 features, 3 samples
  element names: exprs
protocolData: none
phenoData
  sampleNames: rpb3_1 wt_1 rpb3_2
  varLabels: type CEL
  varMetadata: labelDescription
featureData
  featureNames: 1 2 ... 20000 (20000 total)
  fvarLabels: chr seq pos
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:

Martin

>> Normalizing probes with method: loess
> Done with 1 vs 2 in iteration 1 
> #Function continues for some time and displays more messages. No
> benefit from multicore. $ top reports 25% us during the run...
>> array_norm <- collect(pnorm)
> #Oh dear, where did my normalized data go?
>> array_norm
> $`4037`
> NULL
>> sessionInfo()
> R version 2.11.1 (2010-05-31) 
> x86_64-pc-linux-gnu 
> 
> locale:
>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
>  [5] LC_MONETARY=C              LC_MESSAGES=en_GB.UTF-8   
>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
> 
> attached base packages:
> [1] tools     grid      stats     graphics  grDevices utils
>     datasets 
> [8] methods   base     
> 
> other attached packages:
>  [1] geneplotter_1.26.0   annotate_1.26.1      AnnotationDbi_1.10.2
>  [4] Starr_1.4.4          affxparser_1.20.0    affy_1.26.1         
>  [7] Ringo_1.12.0         Matrix_0.999375-39   lattice_0.18-8      
> [10] limma_3.4.4          RColorBrewer_1.0-2   Biobase_2.8.0       
> [13] multicore_0.1-3     
> 
> loaded via a namespace (and not attached):
>  [1] affyio_1.16.0         DBI_0.2-5             genefilter_1.30.0    
>  [4] MASS_7.3-6            preprocessCore_1.10.0 pspline_1.0-14       
>  [7] RSQLite_0.9-2         splines_2.11.1        survival_2.35-8      
> [10] tcltk_2.11.1          xtable_1.5-6         
> 
> RTFMing only gives me the syntax of some functions in the multicore
> package. How do I apply successfully this thing to my code?
> 
> Regards,
> Edwin


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list