[Rd] Deep Replicable Bug With AMD Threadripper MultiCore

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Fri Apr 5 14:10:26 CEST 2019


In addition you can also try to use a PSOCK cluster (see makeCluster, 
parLapply) to avoid the problem - it should help if the problem is 
somehow related to forking in mclapply().

The problem you are seeing may be in base R, in data.table, or in 
interaction between the two (mclapply() from base R uses forking 
directly, data.table uses OpenMP). If you think the bug is in base R, it 
would be much better if you could find a reproducible example that would 
only use packages shipped directly with R, otherwise it might be best to 
contact the maintainer of data.table.

Please also make sure to use the latest version of R 3.5 (or R-devel). 
The implementation of forking in parallel packages, and hence also in 
mclapply, has been rewritten since R 3.4.

Best
Tomas

On 4/5/19 1:28 PM, Dirk Eddelbuettel wrote:
> On 4 April 2019 at 17:28, ivo welch wrote:
> | The following program is whittled down from a much larger program that
> | always works on Intel, and always works on AMD's threadripper with
> | lapply but not mclappy.  With mclapply on AMD, all processes go into
> | "suspend" mode and the program then hangs.  This bug is replicable on an
> | AMD Ryzen Threadripper 2950X 16-Core Processor (128GB RAM), running
> | latest ubuntu 18.04.  The R version 3.5.3 (2019-03-11) -- "Great Truth" ,
> | invoked with --vanilla.  I hope this helps...it took quite a while to get
> | it to this stage.  I sure hope that I am not reporting an old bug...
> |
> | options("mc.cores"=4)
> | library(data.table)
> | library(parallel)
>
> Just how you set mc.cores to 4 for parallel::mclapply I would try throttling
> data.table which in its current version goes for all cores. So do, say,
>
>    setDTthreads(4)
>
> and see if that helps. Try lower and lower values to see if you get by.
> While there may well be a different race condition in mclapply, it may help
> to not overschedule.
>
> (FWIW, the next version of data.table, in queue at CRAN, is less aggressive
> and has additional options for fine tuning.)
>
> Dirk
>
> | if (!file.exists("bugsample.csv")) {
> |     NR <- 64833330
> |     notused <- data.frame(v1=1:NR, v2=1:NR, v3=1:NR, x1=log(1:NR),
> | x2=log(1:NR))
> |     fwrite(notused, file="bugsample.csv")
> |     stop("you can quit now and restart the program")
> | }
> |
> | if (!exists("notused")) notused <- fread("bugsample.csv", nrows= Inf)  ##
> | needed!  Inf cannot be replaced by actual NR
> |
> |
> | sample <- data.frame( groupidentifier=c( rep(11111,2000), rep(22222, 4500 )
> | ) )
> | sample$yvar <- sin(1:nrow(sample))
> | sample$xvar <- 1:nrow(sample)
> |
> |
> | testfun <- function(dl) {
> |     with(dl, message("Working: ", first(groupidentifier), " with ",
> | nrow(dl)))
> |
> |     lapply( 1:nrow(dl), FUN=function(onedayindex) {
> |         if ((onedayindex %% 500) != 0) return(NULL)
> |         with(dl[1:onedayindex,],
> |              c( tryCatch( coef(lm( yvar ~ xvar, data=dl[1:onedayindex,]
> | ))[2], error = function(e) NA ) ) )
> |     })
> | }
> |
> |
> | message("starting --- replicable hang with mclapply, but not lapply")
> |
> | o <- mclapply(split( 1:nrow(sample), sample$groupidentifier ),
> |               FUN=function(.index) testfun( sample[.index, , drop=FALSE] ))
> |
> | message("never gets here with mclapply")
> |
> | print( do.call("c", o[[1]]) )
> | print( do.call("c", o[[2]]) )
> |
> |
> |
> | --
> | Ivo Welch (ivo.welch using ucla.edu)
> |
> | 	[[alternative HTML version deleted]]
> |
> | ______________________________________________
> | R-devel using r-project.org mailing list
> | https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list