[R] multithreading calling from the rpy Python package

René J.V. Bertin rjvbertin at gmail.com
Thu Oct 12 18:08:11 CEST 2006


Hello,

I don't know if this question ought to go here, or rather on R-devel,
so please bear with me.

I'm interfacing to R via RPy (rpy.sf.net) and an embedded Python
interpreter. This is really quite convenient.

I use this approach to calculate the correlation coefficient of 1
independent dataset (vector) with 4 dependent vectors. It'd be nice if
that could be done in 4 parallel threads, or even two.

As long as I stick to pure Python code (using equivalents to R
routines that can be found in Numpy and SciPy), this works fine.
(Tested on a single-core machine.) However, when I call R functions
through rpy, a crash will occur at some point, with the error

*** caught segfault ***
address 0x5164000, cause 'memory not mapped'

(this is on Mac OS X 10.4.8), somewhere in Rf_eval:
Thread 4 Crashed:
0   libR.dylib        	0x03676af0 Rf_eval + 128
1   libR.dylib        	0x03676e6c Rf_eval + 1020
2   libR.dylib        	0x03677108 Rf_eval + 1688
3   libR.dylib        	0x03676e6c Rf_eval + 1020
4   libR.dylib        	0x03677108 Rf_eval + 1688
5   libR.dylib        	0x03676e6c Rf_eval + 1020
6   libR.dylib        	0x03677108 Rf_eval + 1688
7   libR.dylib        	0x03678144 Rf_evalList + 148
8   libR.dylib        	0x036bb5cc do_internal + 796
9   libR.dylib        	0x03676fbc Rf_eval + 1356
10  libR.dylib        	0x0367ad10 Rf_applyClosure + 1120
11  libR.dylib        	0x03676e3c Rf_eval + 972
12  libR.dylib        	0x0367ad10 Rf_applyClosure + 1120
13  libR.dylib        	0x03676e3c Rf_eval + 972
14  libR.dylib        	0x0367a110 do_if + 48
15  libR.dylib        	0x03676fbc Rf_eval + 1356
16  libR.dylib        	0x0367932c do_begin + 108
17  libR.dylib        	0x03676fbc Rf_eval + 1356
18  libR.dylib        	0x0367ad10 Rf_applyClosure + 1120
19  libR.dylib        	0x03676e3c Rf_eval + 972
20  libR.dylib        	0x0361b7c0 protectedEval + 64
21  libR.dylib        	0x0361c170 R_ToplevelExec + 544
22  libR.dylib        	0x0361c22c R_tryEval + 60
23  _rpy2031.so       	0x032f0b8c do_eval_expr + 108
>> 24  _rpy2031.so       	0x032ef950 Robj_call + 688
25  Python2.5         	0x023c6c08 PyObject_Call + 56
26  Python2.5         	0x024a68ec PyEval_EvalFrameEx + 16844
27  Python2.5         	0x024a8cf8 PyEval_EvalFrameEx + 26072
28  Python2.5         	0x024aaef8 PyEval_EvalCodeEx + 3512
29  Python2.5         	0x024a7ce0 PyEval_EvalFrameEx + 21952
30  Python2.5         	0x024a8cf8 PyEval_EvalFrameEx + 26072
31  Python2.5         	0x024aaef8 PyEval_EvalCodeEx + 3512
32  Python2.5         	0x023fbb88 function_call + 472
33  Python2.5         	0x023c6c08 PyObject_Call + 56
34  Python2.5         	0x023d3294 instancemethod_call + 388
35  Python2.5         	0x023c6c08 PyObject_Call + 56
36  Python2.5         	0x024a0cf4 PyEval_CallObjectWithKeywords + 276
37  Python2.5         	0x024f244c t_bootstrap + 60
38  libSystem.B.dylib 	0x9002b508 _pthread_body + 96


Is this because R itself isn't thread-safe, or maybe the R code I'm
calling? I've found discussions on "why should we make R thread-safe
and how" on the website, but there appears to be no date on these
documents.

The R/Python wrapper functions I'm using:

# a variance calculator that returns 0 for vectors that have only 1
non-NaN element:
def vvar(a):
      v=rpy.r.var(a, na_rm=True)
      if isnan(v):
            return 0
      return v

# Calculate the Spearman Rho correlation between a and b and return the result
# as scipy.stats.stats.spearmanr() does:
R_spearmanr=rpy.r('function(a,b){ kk<-cor.test(a,b,method="spearman");
c( kk$estimate[[1]], kk$p.value) ; }')

I'm taking care to make copies of the arrays I'm correlating when
initialising the threads. (I can post more of the Python code, if
required.)
I'm using R 2.3.1 .

thanks in advance,
René

(as always, please CC me on replies sent to the list, thanks!)



More information about the R-help mailing list