[Rd] Detecting physical CPUs in detectCores() on Linux platforms

Julian Hniopek ju||@n@hn|opek @end|ng |rom un|-jen@@de
Mon Aug 7 14:47:46 CEST 2023


On Mon, 2023-08-07 at 07:12 -0500, Dirk Eddelbuettel wrote:
> 
> On 7 August 2023 at 08:48, Nils Kehrein wrote:
> > I recently noticed that `detectCores()` ignores the `logical=FALSE`
> > argument on Linux platforms. This means that the function will
> > always
> > return the number of logical CPUs, i.e. it will count the number of
> > threads
> > that theoretically can run in parallel due to e.g. hyper-threading.
> > Unfortunately, this can result in issues in high-performance
> > computing use
> > cases where hyper-threading might degrade performance instead of
> > improving
> > it.
> > 
> > Currently, src/library/parallel/R/detectCores.R uses the following
> > R/shell
> > code fragment to identify the number of logical CPUs:
> > linux = 'grep "^processor" /proc/cpuinfo 2>/dev/null | wc -l'
> > 
> > As far as I understand, one could derive the number of online
> > physical CPUs
> > by parsing the contents of /sys/devices/system/cpu/* but that seems
> > rather
> > cumbersome. Instead, could we amend the R code with the following
> > line?
> > linux = if(logical) 'grep "^processor" /proc/cpuinfo 2>/dev/null |
> > wc -l'
> > else 'lscpu -b --parse="CORE" | tail -n +5 | sort -u | wc -l'
> 
> That's good but you also need to at protect this from `lscpu` being
> in the
> path.  Maybe `if (logical && nzchar(Sys.which("lscpu")))` ?
> 
> Dirk
> 
Alternatively, using only on POSIX utils which should be in the path of
all Linux Systems and /proc/cpuinfo:

awk '/^physical id/{PHYS_ID=$NF; next} /^cpu cores/{print PHYS_ID"
"$NF;}' /proc/cpuinfo 2>/dev/null | sort | uniq | awk '{sum+=$NF;} END
{print sum}'.

Parses /proc/cpuinfo for the number of physical cores and physical id
in each CPU. Only returns unique combinations of physical id (i.e.
Socket) and core numbers. Then sums up the number of cores for each
physicalid to get the total amount of physical cores.

Something I had lying around. Someone with better awk skills could
probably do sorting and filtering in awk as well to save on pipes.
Works on single and multisocket AMD/Intel from my experience.

Julian
> > 
> > This solution uses `lscpu` from `sys-utils`. The -b switch makes
> > sure that
> > only online CPUs/cores are listed and due to the --parse="CORE",
> > the output
> > will contain only a single column with logical core ids. It seems
> > to do the
> > job in my view, but there might be edge cases for exotic CPU
> > topologies
> > that I am not aware of.
> > 
> > Thank you, Nils
> > 
> >         [[alternative HTML version deleted]]
> > 
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list