[R] multithreading calling from the rpy Python package
René J.V. Bertin
rjvbertin at gmail.com
Thu Oct 12 18:08:11 CEST 2006
Hello,
I don't know if this question ought to go here, or rather on R-devel,
so please bear with me.
I'm interfacing to R via RPy (rpy.sf.net) and an embedded Python
interpreter. This is really quite convenient.
I use this approach to calculate the correlation coefficient of 1
independent dataset (vector) with 4 dependent vectors. It'd be nice if
that could be done in 4 parallel threads, or even two.
As long as I stick to pure Python code (using equivalents to R
routines that can be found in Numpy and SciPy), this works fine.
(Tested on a single-core machine.) However, when I call R functions
through rpy, a crash will occur at some point, with the error
*** caught segfault ***
address 0x5164000, cause 'memory not mapped'
(this is on Mac OS X 10.4.8), somewhere in Rf_eval:
Thread 4 Crashed:
0 libR.dylib 0x03676af0 Rf_eval + 128
1 libR.dylib 0x03676e6c Rf_eval + 1020
2 libR.dylib 0x03677108 Rf_eval + 1688
3 libR.dylib 0x03676e6c Rf_eval + 1020
4 libR.dylib 0x03677108 Rf_eval + 1688
5 libR.dylib 0x03676e6c Rf_eval + 1020
6 libR.dylib 0x03677108 Rf_eval + 1688
7 libR.dylib 0x03678144 Rf_evalList + 148
8 libR.dylib 0x036bb5cc do_internal + 796
9 libR.dylib 0x03676fbc Rf_eval + 1356
10 libR.dylib 0x0367ad10 Rf_applyClosure + 1120
11 libR.dylib 0x03676e3c Rf_eval + 972
12 libR.dylib 0x0367ad10 Rf_applyClosure + 1120
13 libR.dylib 0x03676e3c Rf_eval + 972
14 libR.dylib 0x0367a110 do_if + 48
15 libR.dylib 0x03676fbc Rf_eval + 1356
16 libR.dylib 0x0367932c do_begin + 108
17 libR.dylib 0x03676fbc Rf_eval + 1356
18 libR.dylib 0x0367ad10 Rf_applyClosure + 1120
19 libR.dylib 0x03676e3c Rf_eval + 972
20 libR.dylib 0x0361b7c0 protectedEval + 64
21 libR.dylib 0x0361c170 R_ToplevelExec + 544
22 libR.dylib 0x0361c22c R_tryEval + 60
23 _rpy2031.so 0x032f0b8c do_eval_expr + 108
>> 24 _rpy2031.so 0x032ef950 Robj_call + 688
25 Python2.5 0x023c6c08 PyObject_Call + 56
26 Python2.5 0x024a68ec PyEval_EvalFrameEx + 16844
27 Python2.5 0x024a8cf8 PyEval_EvalFrameEx + 26072
28 Python2.5 0x024aaef8 PyEval_EvalCodeEx + 3512
29 Python2.5 0x024a7ce0 PyEval_EvalFrameEx + 21952
30 Python2.5 0x024a8cf8 PyEval_EvalFrameEx + 26072
31 Python2.5 0x024aaef8 PyEval_EvalCodeEx + 3512
32 Python2.5 0x023fbb88 function_call + 472
33 Python2.5 0x023c6c08 PyObject_Call + 56
34 Python2.5 0x023d3294 instancemethod_call + 388
35 Python2.5 0x023c6c08 PyObject_Call + 56
36 Python2.5 0x024a0cf4 PyEval_CallObjectWithKeywords + 276
37 Python2.5 0x024f244c t_bootstrap + 60
38 libSystem.B.dylib 0x9002b508 _pthread_body + 96
Is this because R itself isn't thread-safe, or maybe the R code I'm
calling? I've found discussions on "why should we make R thread-safe
and how" on the website, but there appears to be no date on these
documents.
The R/Python wrapper functions I'm using:
# a variance calculator that returns 0 for vectors that have only 1
non-NaN element:
def vvar(a):
v=rpy.r.var(a, na_rm=True)
if isnan(v):
return 0
return v
# Calculate the Spearman Rho correlation between a and b and return the result
# as scipy.stats.stats.spearmanr() does:
R_spearmanr=rpy.r('function(a,b){ kk<-cor.test(a,b,method="spearman");
c( kk$estimate[[1]], kk$p.value) ; }')
I'm taking care to make copies of the arrays I'm correlating when
initialising the threads. (I can post more of the Python code, if
required.)
I'm using R 2.3.1 .
thanks in advance,
René
(as always, please CC me on replies sent to the list, thanks!)
More information about the R-help
mailing list