[Rd] Deep Replicable Bug With AMD Threadripper MultiCore

Dirk Eddelbuettel edd @end|ng |rom deb|@n@org
Fri Apr 5 13:28:51 CEST 2019


On 4 April 2019 at 17:28, ivo welch wrote:
| The following program is whittled down from a much larger program that
| always works on Intel, and always works on AMD's threadripper with
| lapply but not mclappy.  With mclapply on AMD, all processes go into
| "suspend" mode and the program then hangs.  This bug is replicable on an
| AMD Ryzen Threadripper 2950X 16-Core Processor (128GB RAM), running
| latest ubuntu 18.04.  The R version 3.5.3 (2019-03-11) -- "Great Truth" ,
| invoked with --vanilla.  I hope this helps...it took quite a while to get
| it to this stage.  I sure hope that I am not reporting an old bug...
| 
| options("mc.cores"=4)
| library(data.table)
| library(parallel)

Just how you set mc.cores to 4 for parallel::mclapply I would try throttling
data.table which in its current version goes for all cores. So do, say,

  setDTthreads(4)

and see if that helps. Try lower and lower values to see if you get by.
While there may well be a different race condition in mclapply, it may help
to not overschedule.

(FWIW, the next version of data.table, in queue at CRAN, is less aggressive
and has additional options for fine tuning.)

Dirk

| if (!file.exists("bugsample.csv")) {
|     NR <- 64833330
|     notused <- data.frame(v1=1:NR, v2=1:NR, v3=1:NR, x1=log(1:NR),
| x2=log(1:NR))
|     fwrite(notused, file="bugsample.csv")
|     stop("you can quit now and restart the program")
| }
| 
| if (!exists("notused")) notused <- fread("bugsample.csv", nrows= Inf)  ##
| needed!  Inf cannot be replaced by actual NR
| 
| 
| sample <- data.frame( groupidentifier=c( rep(11111,2000), rep(22222, 4500 )
| ) )
| sample$yvar <- sin(1:nrow(sample))
| sample$xvar <- 1:nrow(sample)
| 
| 
| testfun <- function(dl) {
|     with(dl, message("Working: ", first(groupidentifier), " with ",
| nrow(dl)))
| 
|     lapply( 1:nrow(dl), FUN=function(onedayindex) {
|         if ((onedayindex %% 500) != 0) return(NULL)
|         with(dl[1:onedayindex,],
|              c( tryCatch( coef(lm( yvar ~ xvar, data=dl[1:onedayindex,]
| ))[2], error = function(e) NA ) ) )
|     })
| }
| 
| 
| message("starting --- replicable hang with mclapply, but not lapply")
| 
| o <- mclapply(split( 1:nrow(sample), sample$groupidentifier ),
|               FUN=function(.index) testfun( sample[.index, , drop=FALSE] ))
| 
| message("never gets here with mclapply")
| 
| print( do.call("c", o[[1]]) )
| print( do.call("c", o[[2]]) )
| 
| 
| 
| --
| Ivo Welch (ivo.welch using ucla.edu)
| 
| 	[[alternative HTML version deleted]]
| 
| ______________________________________________
| R-devel using r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org



More information about the R-devel mailing list