[R-sig-hpc] R, Nomad, HTCondor, etc... and future

Fri May 22 03:59:43 CEST 2020

On 22 May 2020 at 11:46, David Bellot wrote:
| Thanks George.
| To give more colors to what I'm trying to achieve, let me describe the two
| opposite use cases. My use case is in R obviously, and I run one-shot jobs
| to explore data sets as fast as possible and run optimization algos in
| which the objective function is really cpu-intensive to compute.
| 
| At the same time, other people in the same organization want to run
| services, written in other languages and use the same cluster of computers.
| Those services are very different in nature but in general, the idea is to
| have a collection of processes always ready to answer to a request when
| needed. Ideally, the same cluster should be used by everyone so that to
| maximize its uptime, not waste on expensive resources, etc... And ideally,
| I don't want to have many job scheduler/distribution engine to manage at
| the same time. Kind of a Holy Grail, I concede.
| 
| Hence me looking at things like Nomad, HTCondor, etc...

I don't see in the above how your 'one-shot job' is different from your
colleagues need to send spot requests.

I found slurm reasonable in the past, and it has only gotten more widely used
/ available sense.  It will provide you with access to the compute resource,
will account for 'who does what' and can schedule / resource (which I never
really needed, and sounds like you don't either). Plus it will give you easy
view on what is currently up or down, available etc pp.

The devil is as always in the details. I'd say experiment and a little and
take it from there.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org