[Rd] [External] Re: Workaround very slow NAN/Infinities arithmetic?
GILLIBERT, Andre
Andre@G||||bert @end|ng |rom chu-rouen@|r
Fri Oct 1 18:14:57 CEST 2021
> Mildly related (?) to this discussion, if you happen to be in a situation
> where you know something is a C NAN, but need to check if its a proper R
> NA, the R_IsNA function is surprisingly (to me, at least) expensive to do
> in a tight loop because it calls the (again, surprisingly expensive to me)
> isnan function.
What is your platform? CPU, OS, compiler?
How much expensive? 5-10 times slower than the improved code you wrote, or 100-200 times slower?
I analyzed the C and assembly source code of R_IsNA on a x86_64 GNU/Linux computer (Celeron J1900) with GCC 5.4 and found that it was somewhat expensive, but the main problems did not seem to come from isnan.
isnan was only responsible of a ucomisd xmm0, xmm0 instruction followed by a conditional jump on x86_64. This instruction is slower on NAN than on normal FP, but it seems to have an acceptable speed.
On x86_32, the isnan is responsible of a fld mem64, fst mem64, fucomip and conditional jump : it is suboptimal, but things could be worse.
On x86_64, the first problem I noticed is that R_IsNA is not inlined, and the registry-based x86_64 Linux calling convention is not necessarily good for that problem, with added loads/unloads from memory to registry.
Second problem (the worst part) : the write of a 64-bits double followed by the read of a 32-bits integer in the ieee_double union confuses the compiler, that generates very poor code, with unnecessary load/stores.
The second problem can be solved by using a union with a uint64_t and a double fields, and using &0xFFFFFFFF to extract the low part of the uint64_t. This works well for x86_64, but also for x86_32, where GCC avoids useless emulation of 64-bits integers, directly reading the 32-bits integer.
--
Sincerely
André GILLIBERT
More information about the R-devel
mailing list