[Rd] parallel PSOCK connection latency is greater on Linux?
Jeff
je|| @end|ng |rom vtke||er@@com
Mon Nov 2 14:28:55 CET 2020
Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that
they might determine what is best for their potentially latency- or
throughput-sensitive application?
Best,
Jeff
On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar <iucar using fedoraproject.org>
wrote:
> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek
> <simon.urbanek using r-project.org> wrote:
>>
>> It looks like R sockets on Linux could do with TCP_NODELAY --
>> without (status quo):
>
> How many network packets are generated with and without it? If there
> are many small writes and thus setting TCP_NODELAY causes many small
> packets to be sent, it might make more sense to set TCP_QUICKACK
> instead.
>
> Iñaki
>
>> Unit: microseconds
>> expr min lq mean median uq
>> max
>> clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91
>> 48027.83
>> neval
>> 1000
>>
>> exactly the same machine + R but with TCP_NODELAY enabled in
>> R_SockConnect():
>>
>> Unit: microseconds
>> expr min lq mean median uq
>> max neval
>> clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298
>> 5322.234 1000
>>
>> Cheers,
>> Simon
>>
>>
>> > On 2/11/2020, at 3:39 AM, Jeff <jeff using vtkellers.com> wrote:
>> >
>> > I'm exploring latency overhead of parallel PSOCK workers and
>> noticed that serializing/unserializing data back to the main R
>> session is significantly slower on Linux than it is on Windows/MacOS
>> with similar hardware. Is there a reason for this difference and is
>> there a way to avoid the apparent additional Linux overhead?
>> >
>> > I attempted to isolate the behavior with a test that simply
>> returns an existing object from the worker back to the main R
>> session.
>> >
>> > library(parallel)
>> > library(microbenchmark)
>> > gcinfo(TRUE)
>> > cl <- makeCluster(1)
>> > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit =
>> "us"))
>> > plot(x$time, ylab = "microseconds")
>> > head(x$time, n = 10)
>> >
>> > On Windows/MacOS, the test runs in 300-500 microseconds depending
>> on hardware. A few of the 1000 runs are an order of magnitude slower
>> but this can probably be attributed to garbage collection on the
>> worker.
>> >
>> > On Linux, the first 5 or so executions run at comparable speeds
>> but all subsequent executions are two orders of magnitude slower
>> (~40 milliseconds).
>> >
>> > I see this behavior across various platforms and hardware
>> combinations:
>> >
>> > Ubuntu 18.04 (Intel Xeon Platinum 8259CL)
>> > Linux Mint 19.3 (AMD Ryzen 7 1800X)
>> > Linux Mint 20 (AMD Ryzen 7 3700X)
>> > Windows 10 (AMD Ryzen 7 4800H)
>> > MacOS 10.15.7 (Intel Core i7-8850H)
>> >
>> > ______________________________________________
>> > R-devel using r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> --
> Iñaki Úcar
More information about the R-devel
mailing list