[R-sig-hpc] Is combining mclapply and gbm tasks possible using R-3.0.1 ?
Patrick Connolly
p_connolly at slingshot.co.nz
Wed Aug 21 11:45:51 CEST 2013
Apologies for such a long question. The question is fairly simple but
takes a lot of describing.
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_NZ.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_NZ.UTF-8 LC_COLLATE=en_NZ.UTF-8
[5] LC_MONETARY=en_NZ.UTF-8 LC_MESSAGES=en_NZ.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] datasets parallel splines grDevices utils stats graphics
[8] methods base
other attached packages:
[1] gbm_2.1 survival_2.37-4 cairoDevice_2.19 lattice_0.20-15
loaded via a namespace (and not attached):
[1] grid_3.0.1 multicore_0.1-7 tools_3.0.1
Using a system with the above characteristics, I made a function
modifying some of the code from the examples in the gbm() function
help. The objective was to run some examples with different seeds.
And to do those in parallel using mclapply. In the interests of
limiting the size of this email body and of avoiding email software
munging the function code, I've put the code into the attached file
testing.fn.sc which can be sourced into an R session.
That function runs fine when I use an unupdated installation of
R-2.13.1 and gbm 1.6-3.1 being quite capable of using four cores
simultaneously. (It needs slight modification to use multicore
instead of parallel and the call to gbm has no n.cores parameter.)
> testing(4)
2013-08-21 20:57:31 Begin using multicore method with phony data with 4 cores.
Core 1 uses 20442
Core 2 uses 20443
Core 3 uses 20445
Core 4 uses 20447
2013-08-21 20:57:36
....Completed testing multicore method with invented data.
$a
CV Test OOB
1 126 131 79
[...]
$d
CV Test OOB
1 123 140 81
However, when I try it with the current versions I get this:
system.time(bbb <- testing(4))
2013-08-21 16:18:03 Begin using multicore method with phony data with 4 cores.
Core 1 uses 22812
Core 2 uses 22814
Core 3 uses 22816
Core 4 uses 22819
This session PID is 22821:
begun at 2013-08-21 16:18:04:
This session PID is 22829:
begun at 2013-08-21 16:18:04:
This session PID is 22838:
begun at 2013-08-21 16:18:04:
This session PID is 22847:
begun at 2013-08-21 16:18:04:
2013-08-21 16:18:07
....Completed testing multicore method with invented data.
user system elapsed
0.460 1.760 3.926
Warning message:
In mclapply(subsets, FUN = test.gbm, mc.cores = nc, mc.cleanup = FALSE, :
3 function calls resulted in an error
bbb$b
[1] "Error in socketConnection(\"localhost\", port = port, server = TRUE, blocking = TRUE, : \n cannot open the connection\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, open = "a+b", timeout = timeout): cannot open the connection>
>
(bbb$a works properly and the errors on bbb$c and bbb$d are identical
to the above.)
The two lines that look like this
This session PID is 22847:
begun at 2013-08-21 16:18:04:
will look mysterious. It's explained by the fact that my .Rprofile
cats the beginning time and the process id used at the beginning of
each R session (handy to know that sometimes). Those outputs indicate
that extra R processes are started, and I assume end with nothing to
do.
That happens even when there is no problem with mclapply such as when
only a single core is used.
> system.time(aaa <- testing())
2013-08-21 16:09:46 Begin using multicore method with phony data with 1 cores.
Core 1 uses 18484
This session PID is 18487:
begun at 2013-08-21 16:09:46:
Core 2 uses 18511
This session PID is 18543:
begun at 2013-08-21 16:09:50:
Core 3 uses 18553
This session PID is 18556:
begun at 2013-08-21 16:09:54:
Core 4 uses 18609
This session PID is 18612:
begun at 2013-08-21 16:09:58:
2013-08-21 16:10:01
....Completed testing multicore method with invented data.
user system elapsed
1.540 2.070 15.488
>
Running that same code (minus the PID stuff) on a Windows 7
installation on identical hardware runs in about half that time.
Since I don't know how to get the equivalent to the PID information on
Windows, I can't tell if extra R processes are started on that
platform too. However, running a more demanding task did seem to show
that more than one core was being used as though the OS is capable of
a degree of parallelling even when the R tasks are done serially.
My question is how can I use gbm and mclapply without reverting to an
ancient R version?
TIA
--
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
___ Patrick Connolly
{~._.~} Great minds discuss ideas
_( Y )_ Average minds discuss events
(:_~*~_:) Small minds discuss people
(_)-(_) ..... Eleanor Roosevelt
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testing.fn.sc
Type: application/vnd.ibm.secure-container
Size: 3831 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20130821/84da3ccf/attachment.bin>
More information about the R-sig-hpc
mailing list