[Rd] Detecting physical CPUs in detectCores() on Linux platforms

Simon Urbanek @|mon@urb@nek @end|ng |rom R-project@org
Tue Aug 8 01:21:27 CEST 2023


First, detecting HT vs cores is not necessarily possible in general, Linux may assign core id to each HT depending on circumstances:

$ grep 'cpu cores' /proc/cpuinfo | uniq
cpu cores	: 32
$ grep 'model name' /proc/cpuinfo | uniq
model name	: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz

and you can look up that Xenon 6142 has 16 cores.

Second, instead of "awk"ward contortions it's easily done in R with something like

d=read.dcf("/proc/cpuinfo")
sum(as.integer(tapply(
  d[,grep("cpu cores",colnames(d))],
  d[,grep("physical id",colnames(d))], `[`, 1)))

which avoids subprocesses, quoting hell and all such issues...

Cheers,
Simon


> On 8/08/2023, at 12:47 AM, Julian Hniopek <julian.hniopek using uni-jena.de> wrote:
> 
> On Mon, 2023-08-07 at 07:12 -0500, Dirk Eddelbuettel wrote:
>> 
>> On 7 August 2023 at 08:48, Nils Kehrein wrote:
>>> I recently noticed that `detectCores()` ignores the `logical=FALSE`
>>> argument on Linux platforms. This means that the function will
>>> always
>>> return the number of logical CPUs, i.e. it will count the number of
>>> threads
>>> that theoretically can run in parallel due to e.g. hyper-threading.
>>> Unfortunately, this can result in issues in high-performance
>>> computing use
>>> cases where hyper-threading might degrade performance instead of
>>> improving
>>> it.
>>> 
>>> Currently, src/library/parallel/R/detectCores.R uses the following
>>> R/shell
>>> code fragment to identify the number of logical CPUs:
>>> linux = 'grep "^processor" /proc/cpuinfo 2>/dev/null | wc -l'
>>> 
>>> As far as I understand, one could derive the number of online
>>> physical CPUs
>>> by parsing the contents of /sys/devices/system/cpu/* but that seems
>>> rather
>>> cumbersome. Instead, could we amend the R code with the following
>>> line?
>>> linux = if(logical) 'grep "^processor" /proc/cpuinfo 2>/dev/null |
>>> wc -l'
>>> else 'lscpu -b --parse="CORE" | tail -n +5 | sort -u | wc -l'
>> 
>> That's good but you also need to at protect this from `lscpu` being
>> in the
>> path.  Maybe `if (logical && nzchar(Sys.which("lscpu")))` ?
>> 
>> Dirk
>> 
> Alternatively, using only on POSIX utils which should be in the path of
> all Linux Systems and /proc/cpuinfo:
> 
> awk '/^physical id/{PHYS_ID=$NF; next} /^cpu cores/{print PHYS_ID"
> "$NF;}' /proc/cpuinfo 2>/dev/null | sort | uniq | awk '{sum+=$NF;} END
> {print sum}'.
> 
> Parses /proc/cpuinfo for the number of physical cores and physical id
> in each CPU. Only returns unique combinations of physical id (i.e.
> Socket) and core numbers. Then sums up the number of cores for each
> physicalid to get the total amount of physical cores.
> 
> Something I had lying around. Someone with better awk skills could
> probably do sorting and filtering in awk as well to save on pipes.
> Works on single and multisocket AMD/Intel from my experience.
> 
> Julian
>>> 
>>> This solution uses `lscpu` from `sys-utils`. The -b switch makes
>>> sure that
>>> only online CPUs/cores are listed and due to the --parse="CORE",
>>> the output
>>> will contain only a single column with logical core ids. It seems
>>> to do the
>>> job in my view, but there might be edge cases for exotic CPU
>>> topologies
>>> that I am not aware of.
>>> 
>>> Thank you, Nils
>>> 
>>>         [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list