[R-sig-hpc] Excessive network traffic and slow code execution when running R via qsub on an HPC

Rainer M Krug Rainer at krugs.de
Thu Jan 16 11:05:58 CET 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 01/16/14, 10:49 , Chris Davis wrote:
> Hi Rainer,
> 
> Thanks for the response.  The folder I copy to is indeed local to
> the compute node and is not shared with the head node.  I've been
> looking into this further and think sqldf might be part of the
> problem.

Sounds likely to me. Contact the mpackage maintainers or ask at their
mailing list.

Cheers,

Rainer

> I'm able to run a simple R program (repeated
> multiplication/division) with the program maxing out the CPU, and
> no noticeable network traffic.
> 
> Chris
> 
> 
> On Thursday, January 16, 2014 10:16 AM, Rainer M Krug
> <Rainer at krugs.de> wrote:
> 
> 
> On 01/15/14, 23:13 , Chris Davis wrote:
>> I'm trying to debug a problem where jobs running R code that I 
>> submit via qsub are running very slow and causing significant 
>> traffic between the head node and the nodes that the jobs are 
>> running on. The more instances of R I run, the more they 
>> collectively slow down.
> 
>> Much of the documentation I've found on running R on an HPC 
>> recommends using "module load R".  Trying this led to the same 
>> problem as the R install referred to in the module is on the
>> head node.  When I did "lsof -u MyUserName" on the head node, I
>> could see that there were open files related to the R executable
>> and the libraries.  These files were open multiple times, based
>> on the number of running instances of R.
> 
>> I then installed a new version of R in the /tmp directory of the 
>> head node, zipped it all up, and then tried running jobs that
>> would copy this all to the /tmp directory of the nodes, unzip it,
>> and run a version of R locally.  The same symptoms occurred.
>> This time, running lsof on the head node didn't show any files
>> directly related to the R install, but I do see a reference to
>> my $PBS_O_WORKDIR, and there's a mention of several *.so
>> libraries located in lib directories like
>> /.rootfs/el6.4-1/lib64/.  As I understand, these libraries are on
>> the head node.  The output of lsof also shows that the R code
>> isn't spending its time reading/writing to user-defined files on
>> the head node.
> 
>> The bash scripts I'm using to submit the jobs via qsub are 
>> slightly modified from ones that I used in the past to run java 
>> programs. Instead of "java -jar...", I'm now doing 
>> "/tmp/R-3.0.2/bin/Rscript --vanilla MyCode.R $JOB_NUMBER".  The
>> R code loads the necessary libraries on startup, and only reads
>> and writes from the local node. Processing time when things are
>> working ok (tested on a single computer) are around several
>> hours, and the code runs on a single CPU, so no parallel
>> processing is involved. In terms of libraries, I'm just using
>> sqldf aside from the default R libraries, and as I understand
>> from the sqldf documentation, the database is stored in memory,
>> so there should be no network traffic associated with queries
>> that are running.
> 
>> Does anyone have ideas of how I can further debug this?  I'm not 
>> sure what else to check for and haven't been able to find a
>> mention on mailing lists of people running into the same problems
>> with network traffic slowing down running instances of R.
> 
> I did substantial number of simulations on R via qsub, and did not 
> experience slowdowns caused by network traffic, although I tried
> to minimise it by doing exactly what you were trying to do: copying
> (in my case) data to the nodes.I copied the data to $TMPDIR which
> is *local* on the node and not linked.
> 
> I assume, that the directory you are copying to is actiually only 
> linked to the nodes (make sure with the HPC admin which folders
> are linked and which are local).
> 
> On the other hand, My R installation was always in my home
> directory, i.e. not on the nodes, but only linked. So I really
> don't think that the network traffic is because of the R
> instalation or something else.
> 
> 
> Cheers,
> 
> Rainer
> 
> 
> 
>> Regards,
> 
>> Chris
> 
>> [[alternative HTML version deleted]]
> 
> 
> 
>> _______________________________________________ R-sig-hpc
>> mailing list R-sig-hpc at r-project.org
>> <mailto:R-sig-hpc at r-project.org> 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> 
> 
> 
> 
> 
> 

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :       +33 - (0)9 53 10 27 44
Cell:       +33 - (0)6 85 62 59 98
Fax :       +33 - (0)9 58 10 27 44

Fax (D):    +49 - (0)3 21 21 25 22 44

email:      Rainer at krugs.de

Skype:      RMkrug
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJS168GAAoJENvXNx4PUvmCvwEH/3VSkQ+Gqk92+laoykpJhTP7
2W9aFy0jvykG9+85W1JuKCAG786g7dHTYM7axcNpOayvqV8XlHtRTrHlQ54rZE0u
ww60cpYTobNTC46J99ytJMrUeJ65T5CfsKSz5JHUoYJEucFZjS4dR2b4xaJP1KsK
Wl6VRzttyGV6mkvCeKc9Fw1Ydq+JzLotdKvPADoKpA1gTWBBCoxQW0OG4BYDUpjQ
tghTbp8/rbJYD53xAp6H7DIca+glFRVH+QHUAZofDI841UIs8rYY+AGydVigOHEr
6x/adcJjKAP0iK92Cf4su8Ajabkuaw1jN7+Zf8I2LrCV7ad5u30qZ4It172ok74=
=rORb
-----END PGP SIGNATURE-----



More information about the R-sig-hpc mailing list