[R-sig-hpc] Why pure computation time in parallel is longer than the serial version?
Ei-ji Nakama
nakama at ki.rim.or.jp
Mon Feb 24 05:44:25 CET 2014
hi
2014-02-22 19:30 GMT+09:00 Xuening Zhu <puddingnnn529 at gmail.com>:
>
> My cpu is *Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz.* There are 2 physical
> cores and additional 2 logical cores. The memory size is 8G. And my
Logical performance of your CPU..
use SIMD 4 FLOPSparClock x 2.5GHz x 2phisicalcore = 20GFLOPS
use AVX 8 FLOPSparClock x 2.5GHz x 2phisicalcore = 40GFLOPS
# Because a physical core is two, the computing unit is two.
amount of the operation of DGEMM is O(N^3).
>
> I choose a 10^3 * 10^4 matrix and wants to evaluate its
> multiplication(t(m)%*%m) time. I don't consider tcrossprod() because I just
> want to make the computation longer. Maybe more cases can be compared later.
amount of the operation of the procession is O(N^3).
require calculation ... 2*(3e3 * 4e3 * (3e3+4e3)/2) = 84GFLOPS
use SIMD 84/20 = 4.2 sec
use AVX 84/40 = 2.1 sec
so becomes 2.1 seconds in the logical peak performance in your CPU.
Maybe, because the effective efficiency is about range 80% to 90%, it
is likely to become about 2.6 seconds normally.
> user system elapsed
> 10.164 0.512 5.549
maybe this performance of NEHALEM Core.
and hyperthread decreases the efficiency of cache in the procession.
Best Regards,
--
EI-JI Nakama <nakama (a) ki.rim.or.jp>
"\u4e2d\u9593\u6804\u6cbb" <nakama (a) ki.rim.or.jp>
More information about the R-sig-hpc
mailing list