[R-sig-hpc] Determining the maximum memory usage of a function

Ramon Diaz-Uriarte rdiaz02 at gmail.com
Fri Jun 21 02:20:44 CEST 2013


Dear Jonathan,

You mention parallel execution, so I assume that you want to find out the
max memory consumed by the sum of all of your R processes. I guess one
option would be to call gc() at the end of each of your processes. gc()
reports the max used. So return those values, and add them. I am not sure
this will really work well if you use forking, though: I think you can
overestimate, by a large margin, the real memory consumed as many pages
might be shared.


So I do that from the shell.  Since summing memory used by different
processes without double counting (i.e., properly accounting for shared
memory pages, dynamically loaded code, etc) is not easy to do, I do it the
other way around. I assume (well, I try to ensure that) no other processes
will start running/consuming a lot of memory (or, conversely, no other
processes will quit, freeing a lot of memory) while the ones I am
benchmarking run. So I just keep track periodically of the total amount of
free memory (as reported by free, and adding buffers and cached) and, at
the end, subtract max and min (one might as well subtract the minimal
amount of free memory from the initial amount of free memory).



So there are two main pieces:

a) Launching an infinite loop that periodically calls "free" and stores
that somewhere (in this case, the file "free.RAM.txt").  Something like

while true; do free | grep 'buffers/cache' | awk '{print $4}' >> free.RAM.txt; sleep 0.5; done &


b) Killing that as soon as your R process is done, and getting the max,
the min, and the difference (or the min, the initial, and the difference,
which should be fairly similar). I do that from the shell too, calling
R. For instance:


Rscript --vanilla -e 'tmp <- scan("free.RAM.txt"); usage <- (max(tmp) - min(tmp))/(1024^2); cat(usage)')



This is a shell script that I use that puts together the above, and takes
as input the name of the R script for which I want to measure memory usage
(and produces some output at the end).



#############################

RBIN=~/mysources/R-3.0.1-B/bin/R ## wherever the R you are timing lives

SCRIPT=$1
POST=$(date +"%H-%M_%m-%d-%Y")

rm free.RAM.txt

## Decrease the value after sleep if you are concerned about
## missing a peak

while true; do free | grep 'buffers/cache' | awk '{print $4}' >> free.RAM.txt; sleep 0.5; done &
FREE_RAM_PID=$!

$RBIN --vanilla < $SCRIPT > $SCRIPT.$POST.Rout 

kill $FREE_RAM_PID

TOT_RAM_USAGE=$(Rscript --vanilla -e 'tmp <- scan("free.RAM.txt"); usage <- (max(tmp) - min(tmp))/(1024^2); cat(usage)')
mv free.RAM.txt free.RAM.txt.$SCRIPT.$POST

echo
echo
echo $TOT_RAM_USAGE

echo "Total RAM usage = " $TOT_RAM_USAGE >> $SCRIPT.$POST.summary

#########################

Best,

R.


P.S. I've removed r-help from the addresses to avoid cross-posting.

On Thu, 20 Jun 2013 09:45:39 -0500,Jonathan Greenberg <jgrn at illinois.edu> wrote:
> Folks:

> I apologize for the cross-posting between r-help and r-sig-hpc, but I
> figured this question was relevant to both lists.  I'm writing a
> function to be applied to an input dataset that will be broken up into
> chunks for memory management reasons and for parallel execution.  I am
> trying to determine, for a given function, what the *maximum* memory
> usage during its execution is (which may not be the beginning or the
> end of the function, but somewhere in the middle), so I can "plan" for
> the chunk size (e.g. have a table of chunk size vs. max memory usage).

> Is there a trick for determining this?

> --j

> --
> Jonathan A. Greenberg, PhD
> Assistant Professor
> Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
> Department of Geography and Geographic Information Science
> University of Illinois at Urbana-Champaign
> 607 South Mathews Avenue, MC 150
> Urbana, IL 61801
> Phone: 217-300-1924
> http://www.geog.illinois.edu/~jgrn/
> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007

> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
-- 
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina 
Universidad Autónoma de Madrid 
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdiaz02 at gmail.com
       ramon.diaz at iib.uam.es

http://ligarto.org/rdiaz



More information about the R-sig-hpc mailing list