[R] How to parallelize a process called by a socket connection

James Spottiswoode j@me@ @end|ng |rom j@@@oc@com
Sat Feb 1 20:24:51 CET 2020


Hi R Experts,

I’m using R version 3.4.3 running under Linux on an AWS EC2 instance.  I have an R code listening on a port for a socket connection which passes incoming data to a function the results of which are then passed back to the calling machine.  Here’s the function that listens for a socket connection:

# define server function
server <- function() {	
  while(TRUE){
 	con <- socketConnection(host="localhost", port = server_port, blocking=TRUE,
                            server=TRUE, open="r+", timeout = 100000000)    
    	data <- readLines(con, 1L, skipNul = T, ok = T)
    	response <- check(data)    
    	if (!is.null(response)) writeLines(response, con)
  }
}

The server function expects to receive a character string which is then passed to the function check().  check() is a large, complex routine which does text analysis and many other things and returns a JSON string to be passed back to the calling machine.  

This all works perfectly except that while check() spends ~50ms doing its stuff no more requests can be received and processed. Therefore if a new request comes in sooner than ~50ms after the last one, it is not processed. I would therefore like to parallelize this so that the box can be running more than one check() process simulatanously.  I’m familar with several of the paralyzing R packages but I cannot see how to integrate them with the socket connection side of things.  

Currently I have a kludge which is a round-robin approach to solving the problem.  I have 4 versions of the whole R code listening on 4 different ports, say P1, P2, P3, P4, and the calling machine issues calls in sequence to ports P1,P2,P3,P4,P1… etc. This mitigates, but doesn’t solve, the problem.

Any advice would be greatly appreciated!  Thanks.

James 



More information about the R-help mailing list