[R-sig-hpc] foreach + doMC not fully parallel?

Brian D Peyser PhD bpeyser at jhmi.edu
Tue Aug 31 19:59:21 CEST 2010


On Tue, 2010-08-31 at 13:16 +0200, Scionforbai wrote: 
> > I did a reboot and %dopar% started working fully again. I then did
> > suspend/resume and the problem came back.
> 
> > Linux 2.6.32-24-generic #38 SMP Mon Jul 26 15:08:51 EDT 2010 x86_64 GNU/Linux
> 
> Yes, looks like the scheduler is confused after resume. In the
> not-fully-loaded-cores scenario, do you see maybe a high CPU usage by
> process like 'ksoftirqd' or 'kthreadd' (high can be also just 4-5%)?

I did not see any CPU usage from either of those processes when trying
to run multicore following suspend/resume (when my cores were
underutilized).

> If this is true, this is hardly a kernel problem (you are using the
> most uptodate LongTimeSupport kernel, which is actually overtested).
> The origin of this problem can be a daemon or some other
> hardware-dealing software (audio? video - like ati-nvidia-compiz?
> bluetooth? ethernet/wifi?) that doesn't get restarted properly after
> suspend and clog the scheduler with a lot of soft-irq. I observed such
> behavior for example with the license managers for matlab or eclipse,
> and this with a more recent kernel. Try to use powertop and see if you
> have a big difference in interrupts between both scenarios, and take a
> note in your logs for error messages after resume and the list of
> processes actually running after a fresh boot and after a resume. It
> should point out (maybe) which resource blocks your scheduler.
> 
> Have fun,
> 
> Scion

According to powertop, when the problem occurs "[kernel scheduler] Load
balancing tick" is performing fewer wakeups (~220 when a single core is
used vs ~400 when all 4 or 3 are used, ~350 when only 2 cores are used).
That's the only difference I can find with powertop.

I closed Skype and with that closed I am getting full multicore use
sometimes, but even with Skype closed it sometimes fails to fully
utilize all the cores. Maybe there is an issue with network utilization
causing this after suspend/resume. Since Skype is often communicating
with its server that could be the issue. I guess this will take some
more troubleshooting. That will have to wait for later.

-Brian



More information about the R-sig-hpc mailing list