[Rd] Peculiar timing result
Paul Gilbert
pgilbert at bank-banque-canada.ca
Fri Mar 3 17:36:13 CET 2006
Doug
This is probably not your reason, but I am finding my dual core Athlon
64 is much slower running 64 bit Linux and R than it was running 32 bit
Linux and R. All the programs are bigger. (Some, like the clock applet,
are a lot bigger for no obvious reason.) The difference is enough to
put my meager 1GB machine into swapping much more, with the result that
it is a lot slower.
Paul
Douglas Bates wrote:
>I have been timing a particular model fit using lmer on several
>different computers and came up with a peculiar result - the model fit
>is considerably slower on a dual-core Athlon 64 using Goto's
>multithreaded BLAS than on a single-core processor.
>
>Here is the timing on a single-core Athlon 64 3000+ running under
>today's R-devel with version 0.995-5 of the Matrix package.
>
>
>
>>library(Matrix)
>>data(star, package = 'mlmRev')
>>system.time(fm1 <- lmer(math~gr+sx+eth+cltype+(yrs|id)+(1|tch)+(yrs|sch), star, control = list(nit=0,grad=0,msV=1)))
>>
>>
> 0 241720.: 1.16440 0.335239 0.00000 1.78732 0.867209 0.382318 0.00000
> 1 239722.: 1.94952 5.00000e-10 0.00933767 1.65999 0.858003
>0.341520 0.00908757
> 2 239580.: 1.95924 0.0884059 0.00933767 1.65308 0.857487
>0.339296 0.00954718
> 3 239215.: 2.60877 0.0765848 0.0177699 1.45739 0.843562
>0.275100 0.0236849
> 4 239204.: 2.62582 0.106670 0.0239698 1.41976 0.841086
>0.261033 0.0267073
> 5 239176.: 2.63149 0.0787924 0.0367185 1.37952 0.838527
>0.245076 0.0301134
> 6 239141.: 2.64949 0.108534 0.0594586 1.28846 0.832543
>0.208404 0.0375456
> 7 239049.: 2.64794 0.0789214 0.121782 1.10436 0.819711
>0.126101 0.0524965
> 8 239004.: 2.66044 0.117957 0.181505 0.932202 0.798982
>0.0718116 0.0628958
> 9 238944.: 2.66310 0.0819653 0.334477 0.631735 0.740855
>0.258359 0.0806590
> 10 238893.: 2.72626 0.0975205 0.653432 0.703912 0.666067
>0.109922 0.201809
> 11 238892.: 2.74381 0.111146 0.666155 0.693544 0.662000 0.104060 0.207591
> 12 238888.: 2.75052 0.0990238 0.689199 0.694588 0.655781
>0.106516 0.216460
> 13 238861.: 2.80325 0.126935 1.05912 0.733914 0.556162 0.159296 0.360938
> 14 238832.: 2.82656 0.117617 1.59471 0.607916 0.441371
>0.0749944 0.976142
> 15 238811.: 2.87350 0.136332 1.59046 0.653141 0.353763 0.226061 1.79285
> 16 238810.: 2.87663 0.125135 1.58992 0.656808 0.352605 0.220488 1.79282
> 17 238806.: 2.89342 0.141551 1.58607 0.676523 0.344212 0.181833 1.79268
> 18 238804.: 2.90080 0.125137 1.56624 0.682921 0.261295 0.180598 1.74325
> 19 238802.: 2.91950 0.128569 1.56836 0.680436 0.336051 0.159940 1.80400
> 20 238801.: 2.92795 0.134762 1.56597 0.685121 0.331695 0.145547 1.80414
> 21 238801.: 2.93741 0.125667 1.56139 0.687827 0.332700 0.138854 1.81495
> 22 238800.: 2.94588 0.131757 1.55294 0.687909 0.330414 0.137834 1.82826
> 23 238799.: 2.96867 0.129716 1.52943 0.688678 0.323171 0.139912 1.84615
> 24 238799.: 2.98994 0.133378 1.52188 0.700038 0.337387 0.131403 1.77623
> 25 238799.: 3.00312 0.135308 1.51475 0.697550 0.311750 0.145683 1.78037
> 26 238799.: 3.00461 0.129920 1.51083 0.697666 0.306722 0.138745 1.81188
> 27 238799.: 3.00504 0.134882 1.50539 0.696745 0.302949 0.135897 1.84405
> 28 238799.: 3.00422 0.134013 1.47947 0.698115 0.303243 0.133806 1.86486
> 29 238799.: 3.00819 0.134378 1.48185 0.701871 0.307097 0.134637 1.84996
> 30 238799.: 3.01313 0.134279 1.49098 0.702883 0.304788 0.133682 1.86254
> 31 238799.: 3.01291 0.134253 1.49060 0.701818 0.303155 0.133771 1.84613
> 32 238799.: 3.01142 0.134314 1.48921 0.701782 0.303589 0.134439 1.84653
> 33 238799.: 3.01174 0.134315 1.48926 0.701641 0.304120 0.134145 1.84635
> 34 238799.: 3.01175 0.134304 1.48942 0.701740 0.303762 0.134185 1.84649
> 35 238799.: 3.01173 0.134307 1.48937 0.701724 0.303809 0.134206 1.84647
>[1] 43.10 3.78 48.41 0.00 0.00
>
>
>(If you run the timing yourself and don't want to see the iteration
>output, take the msV=1 out of the control list. I keep it in there so
>I can monitor the progress.)
>
>If I time the same model fit on a dual-core Athlon 64 X2 3800+ with
>the same version of R, BLAS and Matrix package, the timing ends up
>with something like
>
>90 140 235 0 0
>
>I do see that the multithreading is working on a calculation that is
>essentially BLAS-bound such as
>
>
>
>>mm <- Matrix(rnorm(1e6), nc = 1e3)
>>system.time(crossprod(mm))
>>
>>
>[1] 0.57 0.02 0.60 0.00 0.00
>
>On the X2 processor it still takes about 0.6 seconds user time but
>only 0.3 seconds elapsed time when the machine is otherwise idle and
>both cores are available for the calculation.
>
>Any suggestions why the dual-core processor is so much slower than the
>single core processor?
>
>By the way, I would be interested in accumulating timings of this
>model fit on other systems. If you do time it please send me
>(off-list) a summary of the version of R, version of the accelerated
>BLAS if you use them, processor speed and configuration (i.e.
>multiprocessor, multicore, etc.) and, if you know it, memory speed.
>
>This is an example of a complex multilevel model with crossed grouping
>factors fit to a relatively large (30000 observations on 10000
>students, 1400 teachers and 80 schools) data set.
>
>______________________________________________
>R-devel at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
More information about the R-devel
mailing list