[R] process id of an R script

Mikkel Grum mi2kelgrum at yahoo.com
Wed Sep 7 17:04:01 CEST 2011


I have a script that runs as a cron job every minute (on Ubuntu 10.10 and R 2.11.1), querying a database for new data. Most of the time it takes a few seconds to run, but once in while it takes more than a minute and the next run starts (on the same data) before the previous one has finished. In extreme cases this will fill up memory with a large number of runs of the same script on the same data. My 'solution' has been to create a process id file with the currently running script, first checking whether there is another process id file and whether that process is still running. I use the following code:

pid <- max(system("pgrep -x R", intern = TRUE))
if (file.exists("/var/run/myscript.pid")) {
rm(pid)
pid <- read.table("/var/run/myscript.pid")[[1]]
if (length(system(paste("ps -p", pid), intern = TRUE)) != 2) {
stop("Myscript is already running in another process.")
} else {
pid <- max(system("pgrep -x R", intern = TRUE))
write(pid, "/var/run/myscript.pid")
}
} else {
write(pid, "/var/run/myscript.pid")
}

....my script .....

file.remove("/var/run/myscript.pid")
#The End

The trouble here is that I also have other R scripts running on the same system, so while max(system("pgrep -x R", intern = TRUE)) will almost always give me the right pid, it is not guaranteed to work. There are two situations where it could fail: when the process id numbers round 32000 and start over again, and if another process starts up at the same time, the process ids could get swapped.

Is there a way to query for the process id of the specific R script, rather than all R processes?

Mikkel



More information about the R-help mailing list