[R-sig-Geo] Parallel processing with extract()/randomForest() in VM

Bede-Fazekas Ákos b|@|ev||@t @end|ng |rom gm@||@com
Tue May 28 07:25:54 CEST 2019


Dear Asantos,

Instead of
exr<-raster::extract(r, polys)
you can use:

beginCluster(8)
exr<-raster::clusterR(x = r, fun = function (raster) {raster::extract(x 
= raster, y = polys)}, export = "polys")
endCluster()

HTH,
Ákos Bede-Fazekas
Hungarian Academy of Sciences

2019.05.28. 3:47 keltezéssel, ASANTOS via R-sig-Geo írta:
> Dear R-Sig-Geo Members,
>
>   ?????? I create a virtual machine (VM) in Google Cloud with Ubuntu 18.04
> with 8 CPU and 30 RAM memory and R 3.6.0 version, but I try to improve
> my spatial analysis without success or same a more faster process. If I
> use packages snow and doMC with all the 8 CPU's in an operation, but it
> use in only 12,54% of our capacity, when the objective is user
> extraction() in raster and RF with randomForest(). The gain of 18
> seconds, I think that is not so good, then my question is there are any
> way for improve that? In my test, I make:
>
> # Take in the ubuntu terminal the number of processors
> foresteyebrazil using superforettech1:~$cat/proc/cpuinfo|grepprocess|wc-l
> 8
> #Packages
> library(raster)
> library(snow)
> library(doMC)
> library(randomForest)
> registerDoMC()
> #Take a raster for worldclim
> r<-getData('worldclim', var='alt', res=5)
> # 1) Use extract()/ randomForest() in Virtual Machine
> ----------------------------
> start_time<-Sys.time()
> # SpatialPolygons
> cds1<-rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
> cds2<-rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
> polys<-spPolygons(cds1, cds2)
> # Extract
> exr<-raster::extract(r, polys)
> tr<-ifelse(exr[[2]]<10,c("A"),c("B"))
> df<-cbind(tr,exr[[2]], sqrt(exr[[2]]))
> df2<-data.frame(as.factor(df[,1]),as.numeric(as.character(df[,2])),as.numeric(as.character(df[,3])))
> df2<-df2[complete.cases(df2),]
> colnames(df2)<-c("res1","var1","var2")
> res<-NULL
> for(win1:9){
> mod_RF<-randomForest(x=cbind(df2$var1,df2$var2), y=df2$res1, ntree=100,
> mtry=2)
> res=rbind(res,cbind(w,mean(mod_RF$err.rate[,1])*100))
> }
> end_time<-Sys.time()
> end_time-start_time
> #
> #Time difference of 38.72528 secs
> # 2) Use extract() with snow and doMC packages in Virtual Machine
> ----------------------------
> start_time<-Sys.time()
> # SpatialPolygons
> cds1<-rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
> cds2<-rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
> polys<-spPolygons(cds1, cds2)
> # Extract
> beginCluster(n=8)
> exr<-raster::extract(r, polys)
> tr<-ifelse(exr[[2]]<10,c("A"),c("B"))
> df<-cbind(tr,exr[[2]], sqrt(exr[[2]]))
> df2<-data.frame(as.factor(df[,1]),as.numeric(as.character(df[,2])),as.numeric(as.character(df[,3])))
> df2<-df2[complete.cases(df2),]
> colnames(df2)<-c("res1","var1","var2")
> endCluster()
> res<-NULL
> mod_RF2<-foreach(1:9) %dopar%{
> randomForest(x=cbind(df2$var1,df2$var2), y=df2$res1, ntree=100, mtry=2)
> }
> res=rbind(res,cbind(mean(mod_RF2$err.rate[,1])*100))
> }
> end_time<-Sys.time()
> end_time-start_time
> #
> #Time difference of 20.57027 secs
>
> Thanks in advanced,
>



More information about the R-sig-Geo mailing list