[Bioc-devel] ClassifyR Fails to Build on Windows

Fri Jan 16 08:50:18 CET 2015

On 01/15/2015 08:00 PM, Dario Strbenac wrote:

> I don't think that is the problem. bpparam() should automatically choose
> settings that work on Windows. BiocInstaller also doesn't pass the checking
> process, according to the online report page. The last version that worked on
> Windows was 1.11.9, so there must have been a change in 1.11.10 which caused
> a problem, which remains in version 1.11.11.

I'm not sure about the archaeology, but as Dan mentioned this is related to the 
different ways in which parallel evaluation works by default on mac / linux 
(multicore) versus windows (SnowParam).

You can recreate the error on non-Windows with

     register(registered("SnowParam"))

which sets the default to Snow. Running the problematic code chunk from the 
vignette then reproduces the error.

I've refactored the relevant problem a bit, but you have

fun <- function(sampleFolds, sampleNumber) {
     if (verbose >= 1 && sampleNumber%%10 == 0)
         message("Processing sample set ", sampleNumber, ".")
     if (bootMode == "fold") {
         lapply(1:length(sampleFolds), function(foldIndex) {
             runTest(expression, training = unlist(sampleFolds[-foldIndex]),
                     testing = sampleFolds[[foldIndex]], params = params,
                     verbose = verbose)
         })
     }
     else {
         runTest(expression, training = sampleFolds[[1]],
                 testing = sampleFolds[[2]],
                 params = params, verbose = verbose)
     }
}

results <- bpmapply(fun, samplesFolds, as.list(1:length(samplesFolds)),
                     BPPARAM = parallelParams)

notice that runTest is not told explicitly about datasetName. When run on a 
single thread or in multicore it's being found in the calling environment. But 
on the new processes started by Snow it's nowhere to be found.

A solution is to write a more 'functional'-style version of 'fun()', one that 
does not rely on implicit variables. I did this using '...'

fun <- function(sampleFolds, sampleNumber, ...) {
     if (verbose >= 1 && sampleNumber%%10 == 0)
         message("Processing sample set ", sampleNumber, ".")
     if (bootMode == "fold") {
         lapply(1:length(sampleFolds), function(foldIndex) {
             runTest(expression, ...,
                     training = unlist(sampleFolds[-foldIndex]),
                     testing = sampleFolds[[foldIndex]], params = params,
                     verbose = verbose)
         })
     }
     else {
         runTest(expression, ...,
                 training = sampleFolds[[1]],
                 testing = sampleFolds[[2]],
                 params = params, verbose = verbose)
     }
}

and then invoking it with enough information

results <- bpmapply(fun,
                     samplesFolds, as.list(1:length(samplesFolds)),
                     datasetName=datasetName,
                     classificationName=classificationName,
                     BPPARAM = parallelParams)

Martin

>
> -------------------------------------- Dario Strbenac PhD Student University
> of Sydney Camperdown NSW 2050 Australia
> _______________________________________________ Bioc-devel at r-project.org
> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793