[Rd] Use GPU in R with .Call

Sun Jul 22 01:35:36 CEST 2012

Hi All,

      I am a newbie to GPU programming. I wonder if anyone can help me with
using GPU in .Call in R.

      Basically, I want to write a function that calcuates the sum of two
double type vectors and implement this using GPU. My final goal is to make
such an implementation callable from R.

      (a) First, I wrote a R-C interface handles the R object using .Call
(saved as VecAdd_cuda.c file) as below.

=================file VecAdd_cuda.c===================
#include <R.h>
#include <Rinternal.h>

/************************************************/
/* "VecAdd_cuda.c" adds two double vectors using GPU. */
/************************************************/
extern void vecAdd_kernel(double *ain,double *bin,double *cout,int len);
SEXP VecAdd_cuda(SEXP a,SEXP b) {
                  int len;
	double *a_ptr,*b_ptr,*resout_ptr;

	/*Digest R objects*/
	len=length(a);
	a_ptr=REAL(a);
	b_ptr=REAL(b);

	SEXP resout;
	PROTECT(resout=allocVector(REALSXP,len));
	resout_ptr=REAL(resout);

	vecAdd_kernel(a_ptr,b_ptr,resout_ptr,len);
	UNPROTECT(1);
	return resout;
}

     (b) Next, the host function and the kernel are in a *SEPARATE* file
called "VecAdd_kernel.cu".

     =======================file VecAdd_kernel.cu========================
#define THREAD_PER_BLOCK 100

__global__ void VecAdd(double *a,double *b, double *c,int len) { 

    int idx = threadIdx.x + blockIdx.x * blockDim.x;
	if (idx<len){
	    c[idx] = a[idx] + b[idx]; 
	}
}

void vecAdd_kernel(double *ain,double *bin,double *cout,int len){
   int alloc_size;
   alloc_size=len*sizeof(double);

   /*Step 0a) Make a device copies of ain,bin,and cout.*/
   double *a_copy,*b_copy,*cout_copy;

   /*Step 0b) Allocate memory for device copies.*/
   cudaMalloc(&a_copy,alloc_size);
   cudaMalloc(&b_copy,alloc_size);
   cudaMalloc(&cout_copy,alloc_size);

   /*Step 0c) Copy arguments to device.*/
   cudaMemcpy(a_copy,ain,alloc_size,cudaMemcpyHostToDevice);
   cudaMemcpy(b_copy,bin,alloc_size,cudaMemcpyHostToDevice);
   cudaMemcpy(cout_copy,cout,alloc_size,cudaMemcpyHostToDevice);

   /*Step 1) Execute kernel.*/

VecAdd<<<(len+THREAD_PER_BLOCK-1)/THREAD_PER_BLOCK,THREAD_PER_BLOCK>>>(a_copy,b_copy,cout_copy,len);

   /*Step 2) Copy result back to host.*/
   cudaMemcpy(cout,cout_copy,alloc_size,cudaMemcpyDeviceToHost);

   /*Step 3) Deallocate memory for device copies.*/
   cudaFree(a_copy);
   cudaFree(b_copy);
   cudaFree(cout_copy);

   /*Step 4) Get rid of the cuda context,necessary to avoid segfault when R
exits.*/
   cudaThreadExit();
}

     (c) Lastly, I wrote a R wrapper function called "VecAdd_cuda.R" as
below.
     ======================file VecAdd_cuda.R"===========================
#***************************************#
# This is R wrapper function for VecAdd_cuda.   #
#***************************************# 
VecAdd_cuda<-function(a,b){

    if (!is.vector(a) || !is.vector(b)){
         stop("In VecAdd_cuda.R,a and b should be vectors of same length!");
    }

    #load the C code
    if (!is.loaded('VecAdd_cuda')){

lib.file<-file.path(paste("VecAdd_cuda",.Platform$dynlib.ext,sep=""));
        dyn.load(lib.file);
        cat(" -Loaded ", lib.file, "\n");
    }
    .Call("VecAdd_cuda",a,b);
}

==================================================

     I am using a 64 bit windows 7 machine. My laptop has graphical card
that are GPU-enabled with computing capability 2.0 (i.e., double precision
is supported).  I can compile "VecAdd_kernel.cu" file and get
"VecAdd_kernel.o" file using the command (I have everything needed for
compiling .cu file installed correctly.)
          nvcc -c -m64 -arch=sm_13 VecAdd_kernel.cu -o VecAdd_kernel.o 

     But my question is:
     (a) How do I compile "VecAdd_cuda.c" and got a .dll file that I can
dynamically load into R.  I tried to use the following
      R CMD SHLIB VecAdd_cuda.c 
   and got the message
                  "undefined reference to VecAdd_kernel, collect2:ld
returned 1 exit stauts". 
      My understanding is that since the function "VecAdd_kernel" is defined
in a different file from "VecAdd_cuda.c", I need to link VecAdd_cuda.c with
VecAdd_kernel.cu or VecAdd_kernel.o. But I do not know how!

      Does anyone know what should I do to get "VecAdd_cuda.dll" that I can
load into R? Thanks!

--
View this message in context: http://r.789695.n4.nabble.com/Use-GPU-in-R-with-Call-tp4637333.html
Sent from the R devel mailing list archive at Nabble.com.