[Rd] Measuring and comparing .C and .Call overhead

Tue Jul 19 17:07:35 CEST 2011

Further pursuing my curiosity to measure the efficiency of R/C++ interface, I
conducted a simple matrix-vector multiplication test using .C and .Call
functions in R. In each case, I measured the execution time in R, as well as
inside the C++ function. Subtracting the two, I came up with a measure of
overhead associated with each call. I assume that this overhead would be
non-existent of the entire code was implemented in C++. [Please let me know
if my approach is somehow faulty.] The results of this simple test have
significant implications for my software development approach. I hope others
find this information useful as well. I have attached the code files in the
bottom. Below I provide a summary output, and interpret the results. [All
times in table below are in microseconds]

m/n/time_.Call_R/time_.Call_C++/.Call_overhead/time_.C_R/time_.C_C++/.C_overhead
100/10/3.23/1.33/*1.90*/16.89/0.96/*15.93*
200/15/5.38/3.44/*1.94*/30.77/2.81/*27.96*
500/20/12.48/10.4/*2.08*/78.12/9.37/*68.75*

Interpretation:

1- .Call overhead holds nearly constant, i.e. independent of data size. This
is expected since .Call works by passing pointers. (How can we explain the
slight increase in overhead?)
2- C++ times for .C are somewhat better than .Call. This is likely to be due
to the overhead associated with unpacking the SEXP pointers in a .Call
function.
3- The overhead for .C dominates the execution time. For a 500x20 matrix,
the overhead is ~90% of total time. This means that whenever we need to make
repeated calls to a C/C++ function from R, and when performance is important
to us, .Call is much preferred to .C, even at modest data sizes.
4- Overhead for .C scales sub-linearly with data size. I imagine that this
overhead can be reduced through registering the functions and using the
style field in R_CMethodDef to optimize data transfer (per Section 5.4 of
"Writing R Extensions"), but perhaps not by more than a half.
5- Even using .Call, the overhead is still a significant percentage of total
time, though the contribution decreases with data size (and compute load of
function). However, in the context of parallelization benefits, this
overhead puts an unpleasant cap on achievable gains. For example, even if
the compute time goes down by 100x, if overhead was even 10% of the time,
our effective gain will saturate at 10x. In other words, effective
parallelization can only be achieved by moving big chunks of the code to
C/C++ so as to avoid repeated calls from R to C.

I look forward to your comments.

-Alireza
--------------------------------
#code file: mvMultiply.r
#measuring execution time for a matrix-vector multiplication (A%*%x) using
.C and .Call functions
#execution time is measured both in R and inside the C++ functions;
subtracting the two is 
#used as an indicator of overhead associated with each call

rm(list = ls())
system("R CMD SHLIB mvMultiply.cc")
dyn.load("mvMultiply.so")

m <- 100 #number of rows in matrix A
n <- 10 #number of columns in matrix A (= number of rows in vector x)
N <- 1000000

A <- runif(m*n)
x <- runif(n)

# measuring .Call
tCall_c <- 0.0
t1 <- proc.time()[3]
for (i in 1:N) {
	tCall_c <- tCall_c + .Call("matvecMultiply", as.double(A), as.double(x))
}
tCall_R <- proc.time()[3] - t1
cat(".Call - Time measured in R: ", round(tCall_R,2), "sec\n")
cat(".Call - Time measured in C++: ", round(tCall_c,2), "sec\n")
cat(".Call - Implied overhead: ", round(tCall_R,2) - round(tCall_c,2), "sec 
->  per call: ", 1000000*(round(tCall_R,2) - round(tCall_c,2))/N, "usec\n")

# measuring .C
tC_c <- 0.0
t1 <- proc.time()[3]
for (i in 1:N) {
	tC_c <- tC_c + .C("matvecMultiply2", as.double(A), as.double(x),
double(c(m)), as.integer(c(m)), as.integer(c(n)), t = double(1))$t
}
tC_R <- proc.time()[3] - t1
cat(".C - Time measured in R: ", round(tC_R,2), "sec\n")
cat(".C - Time measured in C++: ", round(tC_c,2), "sec\n")
cat(".C - Implied overhead: ", round(tC_R,2) - round(tC_c,2), "sec  ->  per
call: ", 1000000*(round(tC_R,2) - round(tC_c,2))/N, "usec\n")

dyn.unload("mvMultiply.so")
--------------------------------
#code file: myMultiply.cc
#include <Rinternals.h>
#include <R.h>
#include <sys/time.h>

extern "C" {

SEXP matvecMultiply(SEXP A, SEXP x) {
	timeval time1, time2;
	gettimeofday(&time1, NULL);
	SEXP execTime_sxp;
	PROTECT(execTime_sxp = allocVector(REALSXP, 1));
	double *execTime; execTime = REAL(execTime_sxp);
	double *rA = REAL(A), *rx = REAL(x), *ry;
	int n = length(x);
	int m = length(A) / n;
	SEXP y;
	PROTECT(y = allocVector(REALSXP, m));
	ry = REAL(y);
	for (int i = 0; i < m; i++) {
		ry[i] = 0.0;
		for (int j = 0; j < n; j++) {
			ry[i] += rA[j * m + i] * rx[j];
		}
	}
	UNPROTECT(1);
	gettimeofday(&time2, NULL);
	*execTime = (time2.tv_sec - time1.tv_sec) + (time2.tv_usec -
time1.tv_usec)/1000000.0;
	UNPROTECT(1);
	return execTime_sxp;
}

void matvecMultiply2(double *A, double *x, double *y, int *m_ptr, int
*n_ptr, double *execTime) {
	timeval time1, time2;
	gettimeofday(&time1, NULL);
	int m = *m_ptr;
	int n = *n_ptr;
	for (int i = 0; i < m; i++) {
		y[i] = 0.0;
		for (int j = 0; j < n; j++) {
			y[i] += A[j * m + i] * x[j];
		}
	}
	gettimeofday(&time2, NULL);
	*execTime = (time2.tv_sec - time1.tv_sec) + (time2.tv_usec -
time1.tv_usec)/1000000.0;
}

}
--------------------------------

--
View this message in context: http://r.789695.n4.nabble.com/Measuring-and-comparing-C-and-Call-overhead-tp3678361p3678361.html
Sent from the R devel mailing list archive at Nabble.com.