makeCluster {parallel} | R Documentation |
Create a Parallel Socket Cluster
Description
Creates a set of copies of R running in parallel and communicating over sockets.
Usage
makeCluster(spec, type, ...)
makePSOCKcluster(names, ...)
makeForkCluster(nnodes = getOption("mc.cores", 2L), ...)
stopCluster(cl = NULL)
setDefaultCluster(cl = NULL)
getDefaultCluster()
Arguments
spec |
A specification appropriate to the type of cluster. |
names |
Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on ‘localhost’). |
nnodes |
The number of nodes to be forked. |
type |
One of the supported types: see ‘Details’. |
... |
Options to be passed to the function spawning the workers. See ‘Details’. |
cl |
an object of class |
Details
makeCluster
creates a cluster of one of the supported types.
The default type, "PSOCK"
, calls makePSOCKcluster
. Type
"FORK"
calls makeForkCluster
. Other types are passed to
package snow.
makePSOCKcluster
is an enhanced version of
makeSOCKcluster
in package snow. It runs
Rscript
on the specified host(s) to set up a worker process
which listens on a socket for expressions to evaluate, and returns the
results (as serialized objects).
makeForkCluster
is merely a stub on Windows. On Unix-alike
platforms it creates the worker process by forking.
The workers are most often running on the same host as the master, when no options need be set.
Several options are supported (mainly for makePSOCKcluster
):
master
The host name of the master, as known to the workers. This may not be the same as it is known to the master, and on private subnets it may be necessary to specify this as a numeric IP address. For example, macOS is likely to detect a machine as ‘somename.local’, a name known only to itself.
port
The port number for the socket connection, default taken from the environment variable R_PARALLEL_PORT, then a randomly chosen port in the range
11000:11999
.timeout
The timeout in seconds for that port. This is the maximum time of zero communication between master and worker before failing. Default is 30 days (and the POSIX standard only requires values up to 31 days to be supported).
setup_timeout
The maximum number of seconds a worker attempts to connect to master before failing. Default is 2 minutes. The waiting time before the next attempt starts at 0.1 seconds and is incremented 50% after each retry.
outfile
Where to direct the
stdout
andstderr
connection output from the workers.""
indicates no redirection (which may only be useful for workers on the local machine). Defaults to ‘/dev/null’ (‘nul:’ on Windows). The other possibility is a file path on the worker's host. Files will be opened in append mode, as all workers log to the same file.homogeneous
Logical, default true. See ‘Note’.
rscript
See ‘Note’.
rscript_args
Character vector of additional arguments for
Rscript
such as --no-environ.renice
A numerical ‘niceness’ to set for the worker processes, e.g.
15
for a low priority. OS-dependent: seepsnice
for details.rshcmd
The command to be run on the master to launch a process on another host. Defaults to
ssh
.user
The user name to be used when communicating with another host.
manual
Logical. If true the workers will need to be run manually.
methods
Logical. If true (default) the workers will load the methods package: not loading it saves ca 30% of the startup CPU time of the cluster.
useXDR
Logical. If true (default) serialization will use XDR: where large amounts of data are to be transferred and all the nodes are little-endian, communication may be substantially faster if this is set to false.
setup_strategy
Character. If
"parallel"
(default) workers will be started in parallel during cluster setup when this is possible, which is now for homogeneous"PSOCK"
clusters with all workers started automatically (manual = FALSE
) on the local machine. Workers will be started sequentially on other clusters, on all clusters withsetup_strategy = "sequential"
and on R 3.6.0 and older. This option is for expert use only (e.g. debugging) and may be removed in future versions of R.
Function makeForkCluster
creates a socket cluster by forking
(and hence is not available on Windows). It supports options
port
, timeout
and outfile
, and always uses
useXDR = FALSE
. It is strongly discouraged to use the
"FORK"
cluster with GUI front-ends or multi-threaded libraries.
See mcfork
for details.
It is good practice to shut down the workers by calling
stopCluster
: however the workers will terminate
themselves once the socket on which they are listening for commands
becomes unavailable, which it should if the master R session is
completed (or its process dies).
Function setDefaultCluster
registers a cluster as the default one
for the current session. Using setDefaultCluster(NULL)
removes
the registered cluster, as does stopping that cluster.
Value
For the cluster creators, an object of class
c("SOCKcluster", "cluster")
.
For the default cluster setter and getter, the registered default
cluster or NULL
if there is no such cluster.
Note
Option homogeneous = TRUE
was for years documented as
‘Are all the hosts running identical setups?’, but this was
apparently more restrictive than its author intended and not required
by the code.
The current interpretation of homogeneous = TRUE
is that
Rscript
can be launched using the same path on each worker.
That path is given by the option rscript
and defaults to the
full path to Rscript
on the master. (The workers are not
required to be running the same version of R as the master, nor even
as each other.)
For homogeneous = FALSE
, Rscript
on the workers is
found on their default shell's path.
For the very common usage of running both master and worker on a single multi-core host, the default settings are the appropriate ones.
A socket connection is used to communicate from the master to each worker so the maximum number of connections (default 128 but some will be in use) may need to be increased when the master process is started.
Author(s)
Luke Tierney and R Core.
Derived from the snow package.