[R] multithreading calling from the rpy Python package

Duncan Temple Lang duncan at wald.ucdavis.edu
Thu Oct 12 18:43:01 CEST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


[Taken from below]
> Is this because R itself isn't thread-safe, or maybe the R code I'm
> calling? I've found discussions on "why should we make R thread-safe
> and how" on the website, but there appears to be no date on these
> documents.
>

It is a mixture of two things. Yes, R is not thread safe so if
two system threads were to access R concurrently, bad things would
happen a.s.
It is also an issue when Python is compiled and linked with
threaded options and routines from  the system, e.g. libpthread
and R is not.  When R is dynamically loaded into the Python
process, unless R is very carefully compiled, symbols (i.e. routines)
that R uses will come from the Python executable and these may not
agree with R's view at compilation. And bad things happen.
This depends on your operating system, and it doesn't appear that
you have told us what that is. Bad boy :-)
This is an issue with Rpy, RSPython, RSPerl, R apache module, rJava, .......

I have started down the road of making R thread-safe and
threaded on several occassions.  I have not committed these
extensive changes for a variety of reasons. One is that a lot
of R internals would change and this would have an impact of
packages with native code. So we need a way to, at least partially,
automate this for package authors.  I am making a lot of progress in
that front recently with the RGCCTranslationUnit package which
allows us to examine C/C++ code from within R.

[The following is definitely for R-devel, so anyone replying,
please remove the r-help and cc r-devel at r-project.org]

And one of the issues that also makes me hesitate in doing this
is whether we shouldn't take the time to introduce additional
extensive changes in the architecture of an R-like interpreter,
e.g. make it extensible at the native level.  For stat. computing
to continue to grow and for all of us to be able to explore newer
areas, we probably need to think about building infrastructure for the
next 5- 10 years and not continue to tweak a model that has been around
for 30 years.  How we do this requires some serious thought
and evaluating trade-offs of building things ourselves with a small
community or leveraging other existing or emerging systems, e.g. Python,
Perl6/Parrot, etc.

 My $.02

  D.




René J.V. Bertin wrote:
> Hello,
> 
> I don't know if this question ought to go here, or rather on R-devel,
> so please bear with me.
> 
> I'm interfacing to R via RPy (rpy.sf.net) and an embedded Python
> interpreter. This is really quite convenient.
> 
> I use this approach to calculate the correlation coefficient of 1
> independent dataset (vector) with 4 dependent vectors. It'd be nice if
> that could be done in 4 parallel threads, or even two.
> 
> As long as I stick to pure Python code (using equivalents to R
> routines that can be found in Numpy and SciPy), this works fine.
> (Tested on a single-core machine.) However, when I call R functions
> through rpy, a crash will occur at some point, with the error
> 
> *** caught segfault ***
> address 0x5164000, cause 'memory not mapped'
> 
> (this is on Mac OS X 10.4.8), somewhere in Rf_eval:
> Thread 4 Crashed:
> 0   libR.dylib        	0x03676af0 Rf_eval + 128
> 1   libR.dylib        	0x03676e6c Rf_eval + 1020
> 2   libR.dylib        	0x03677108 Rf_eval + 1688
> 3   libR.dylib        	0x03676e6c Rf_eval + 1020
> 4   libR.dylib        	0x03677108 Rf_eval + 1688
> 5   libR.dylib        	0x03676e6c Rf_eval + 1020
> 6   libR.dylib        	0x03677108 Rf_eval + 1688
> 7   libR.dylib        	0x03678144 Rf_evalList + 148
> 8   libR.dylib        	0x036bb5cc do_internal + 796
> 9   libR.dylib        	0x03676fbc Rf_eval + 1356
> 10  libR.dylib        	0x0367ad10 Rf_applyClosure + 1120
> 11  libR.dylib        	0x03676e3c Rf_eval + 972
> 12  libR.dylib        	0x0367ad10 Rf_applyClosure + 1120
> 13  libR.dylib        	0x03676e3c Rf_eval + 972
> 14  libR.dylib        	0x0367a110 do_if + 48
> 15  libR.dylib        	0x03676fbc Rf_eval + 1356
> 16  libR.dylib        	0x0367932c do_begin + 108
> 17  libR.dylib        	0x03676fbc Rf_eval + 1356
> 18  libR.dylib        	0x0367ad10 Rf_applyClosure + 1120
> 19  libR.dylib        	0x03676e3c Rf_eval + 972
> 20  libR.dylib        	0x0361b7c0 protectedEval + 64
> 21  libR.dylib        	0x0361c170 R_ToplevelExec + 544
> 22  libR.dylib        	0x0361c22c R_tryEval + 60
> 23  _rpy2031.so       	0x032f0b8c do_eval_expr + 108
> 
>>>24  _rpy2031.so       	0x032ef950 Robj_call + 688
> 
> 25  Python2.5         	0x023c6c08 PyObject_Call + 56
> 26  Python2.5         	0x024a68ec PyEval_EvalFrameEx + 16844
> 27  Python2.5         	0x024a8cf8 PyEval_EvalFrameEx + 26072
> 28  Python2.5         	0x024aaef8 PyEval_EvalCodeEx + 3512
> 29  Python2.5         	0x024a7ce0 PyEval_EvalFrameEx + 21952
> 30  Python2.5         	0x024a8cf8 PyEval_EvalFrameEx + 26072
> 31  Python2.5         	0x024aaef8 PyEval_EvalCodeEx + 3512
> 32  Python2.5         	0x023fbb88 function_call + 472
> 33  Python2.5         	0x023c6c08 PyObject_Call + 56
> 34  Python2.5         	0x023d3294 instancemethod_call + 388
> 35  Python2.5         	0x023c6c08 PyObject_Call + 56
> 36  Python2.5         	0x024a0cf4 PyEval_CallObjectWithKeywords + 276
> 37  Python2.5         	0x024f244c t_bootstrap + 60
> 38  libSystem.B.dylib 	0x9002b508 _pthread_body + 96
> 
> 
> Is this because R itself isn't thread-safe, or maybe the R code I'm
> calling? I've found discussions on "why should we make R thread-safe
> and how" on the website, but there appears to be no date on these
> documents.
> 
> The R/Python wrapper functions I'm using:
> 
> # a variance calculator that returns 0 for vectors that have only 1
> non-NaN element:
> def vvar(a):
>       v=rpy.r.var(a, na_rm=True)
>       if isnan(v):
>             return 0
>       return v
> 
> # Calculate the Spearman Rho correlation between a and b and return the result
> # as scipy.stats.stats.spearmanr() does:
> R_spearmanr=rpy.r('function(a,b){ kk<-cor.test(a,b,method="spearman");
> c( kk$estimate[[1]], kk$p.value) ; }')
> 
> I'm taking care to make copies of the arrays I'm correlating when
> initialising the threads. (I can post more of the Python code, if
> required.)
> I'm using R 2.3.1 .
> 
> thanks in advance,
> René
> 
> (as always, please CC me on replies sent to the list, thanks!)
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

- --
Duncan Temple Lang                    duncan at wald.ucdavis.edu
Department of Statistics              work:  (530) 752-4782
4210 Mathematical Sciences Building   fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis,
CA 95616,
USA
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFFLnCV9p/Jzwa2QP4RAkIRAJ9IoVzSThKySLEdriqrIc1ytASqZwCeKtPo
dEPN+UBNoItTrz5GgJpdTL8=
=T+1X
-----END PGP SIGNATURE-----



More information about the R-help mailing list