[Bioc-devel] R version-dependent segfault

Thu Jan 5 20:32:16 CET 2017

On 01/05/2017 11:10 AM, Vladimir Kiselev wrote:
> Dear Martin,
>
> Many thanks for your reply, it was really helpful. My collaborator ran
> the commands you suggested and got the following output:
>
> *****
> $ Rscript norm_laplacian.R
> /home/jake/miniconda3/lib/R/bin/R CMD SHLIB -o 'sourceCpp_2.so'
>  'norm_laplacian.cpp'
> g++ -I/home/jake/miniconda3/lib/R/include -DNDEBUG
>  -I/home/jake/miniconda3/include
>  -I"/home/jake/R/x86_64-pc-linux-gnu-library/3.3/Rcpp/include"
> -I"/home/jake/R/x86_64-pc-linux-gnu-library/3.3/RcppArmadillo/include"
> -I"/home/jake/Documents"   -fpic  -I/home/jake/miniconda3/include  -c
> norm_laplacian.cpp -o norm_laplacian.o
> g++ -shared -L/home/jake/miniconda3/lib/R/lib
> -L/home/jake/miniconda3/lib -lgfortran -o sourceCpp_2.so
> norm_laplacian.o -L/home/jake/miniconda3/lib/R/lib -lRlapack
> -L/home/jake/miniconda3/lib/R/lib -lRblas -lgfortran -lm -lquadmath
> -L/home/jake/miniconda3/lib/R/lib -lR
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.10
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  base
>
> other attached packages:
> [1] Rcpp_0.12.8
>
> loaded via a namespace (and not attached):
> [1] tools_3.3.2               RcppArmadillo_0.7.600.1.0
>
> $ g++ --version|head -n1
> g++ (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
> *****
>
> So running the simplified code did not produce a segfault and suggested
> that the problem was in SC3::norm_laplacian(). And indeed, running
> valgrind with SC3::norm_laplacian(matrix(runif(100), nrow = 10)) did
> catch the error:
>
> *****
> $ R -d valgrind -f norm_laplacian.R
> ==7046== Memcheck, a memory error detector
> ==7046== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> ==7046== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for
> copyright info
> ==7046== Command: /home/jake/miniconda3/lib/R/bin/exec/R -f norm_laplacian.R
> ==7046==
>
> R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
> Copyright (C) 2016 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
>
>> library(Rcpp)
>> sourceCpp("norm_laplacian.cpp", showOutput=TRUE)
> /home/jake/miniconda3/lib/R/bin/R CMD SHLIB -o 'sourceCpp_2.so'
>  'norm_laplacian.cpp'
> g++ -I/home/jake/miniconda3/lib/R/include -DNDEBUG
>  -I/home/jake/miniconda3/include
>  -I"/home/jake/R/x86_64-pc-linux-gnu-library/3.3/Rcpp/include"
> -I"/home/jake/R/x86_64-pc-linux-gnu-library/3.3/RcppArmadillo/include"
> -I"/home/jake/Documents"   -fpic  -I/home/jake/miniconda3/include  -c
> norm_laplacian.cpp -o norm_laplacian.o
> g++ -shared -L/home/jake/miniconda3/lib/R/lib
> -L/home/jake/miniconda3/lib -lgfortran -o sourceCpp_2.so
> norm_laplacian.o -L/home/jake/miniconda3/lib/R/lib -lRlapack
> -L/home/jake/miniconda3/lib/R/lib -lRblas -lgfortran -lm -lquadmath
> -L/home/jake/miniconda3/lib/R/lib -lR
>> xx <- norm_laplacian(matrix(runif(100), nrow = 10))
>> SC3::norm_laplacian(matrix(runif(100), nrow = 10))
> ==7046== Use of uninitialised value of size 8
> ==7046==    at 0x213645B8: direct_max<double> (op_max_meat.hpp:362)
> ==7046==    by 0x213645B8: max (Mat_meat.hpp:6801)
> ==7046==    by 0x213645B8: norm_laplacian(arma::Mat<double>)
> (cppFunctions.cpp:87)
> ==7046==    by 0x21360E8C: SC3_norm_laplacian (RcppExports.cpp:49)
> ==7046==    by 0x521DE81: R_doDotCall (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x522BD99: do_dotcall (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x5267AAC: Rf_eval (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x526B0A7: do_begin (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x526789E: Rf_eval (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x5268B8C: Rf_applyClosure (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x5267BBF: Rf_eval (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x52A6C1E: Rf_ReplIteration (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x52A6DD1: R_ReplConsole (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x52A8758: run_Rmainloop (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==
> ==7046== Invalid read of size 8
> ==7046==    at 0x213645B8: direct_max<double> (op_max_meat.hpp:362)
> ==7046==    by 0x213645B8: max (Mat_meat.hpp:6801)
> ==7046==    by 0x213645B8: norm_laplacian(arma::Mat<double>)
> (cppFunctions.cpp:87)
> ==7046==    by 0x21360E8C: SC3_norm_laplacian (RcppExports.cpp:49)
> ==7046==    by 0x521DE81: R_doDotCall (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x522BD99: do_dotcall (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x5267AAC: Rf_eval (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x526B0A7: do_begin (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x526789E: Rf_eval (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x5268B8C: Rf_applyClosure (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x5267BBF: Rf_eval (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x52A6C1E: Rf_ReplIteration (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x52A6DD1: R_ReplConsole (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x52A8758: run_Rmainloop (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==  Address 0xa0000000a is not stack'd, malloc'd or (recently) free'd
> ==7046==
>
>  *** caught segfault ***
> address 0xa0000000a, cause 'memory not mapped'
>
> Traceback:
>  1: .Call("SC3_norm_laplacian", PACKAGE = "SC3", A)
>  2: SC3::norm_laplacian(matrix(runif(100), nrow = 10))
> An irrecoverable exception occurred. R is aborting now ...
> ==7046==
> ==7046== Process terminating with default action of signal 11 (SIGSEGV)
> ==7046==    at 0x5C4C4DD: raise (raise.c:53)
> ==7046==    by 0x52A76C4: sigactionSegv (in
> /home/jake/miniconda3/lib/R/lib/libR.so)
> ==7046==    by 0x5C4C62F: ??? (in
> /lib/x86_64-linux-gnu/libpthread-2.24.so <http://libpthread-2.24.so>)
> ==7046==    by 0x213645B7: direct_max<double> (op_max_meat.hpp:360)
> ==7046==    by 0x213645B7: max (Mat_meat.hpp:6801)
> ==7046==    by 0x213645B7: norm_laplacian(arma::Mat<double>)
> (cppFunctions.cpp:87)
> ==7046==
> ==7046== HEAP SUMMARY:
> ==7046==     in use at exit: 149,681,035 bytes in 81,447 blocks
> ==7046==   total heap usage: 275,293 allocs, 193,846 frees, 410,601,938
> bytes allocated
> ==7046==
> ==7046== LEAK SUMMARY:
> ==7046==    definitely lost: 0 bytes in 0 blocks
> ==7046==    indirectly lost: 0 bytes in 0 blocks
> ==7046==      possibly lost: 0 bytes in 0 blocks
> ==7046==    still reachable: 149,681,035 bytes in 81,447 blocks
> ==7046==                       of which reachable via heuristic:
> ==7046==                         newarray           : 7,152 bytes in 1
> blocks
> ==7046==         suppressed: 0 bytes in 0 blocks
> ==7046== Rerun with --leak-check=full to see details of leaked memory
> ==7046==
> ==7046== For counts of detected and suppressed errors, rerun with: -v
> ==7046== Use --track-origins=yes to see where uninitialised values come from
> ==7046== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
> Segmentation fault (core dumped)
> *****
>
> So, it looks like the problem is in finding the max of the input matrix
> A, specifically in the direct_max() function of Armadillo library. But
> the segfault only appears when running SC3::norm_laplacian() and not the
> simplified code. Does it suggest that I have some linking or exporting
> problems in the package? Can you think of any solution to this problem?
>
> In case this can help, the SC3::norm_laplacian() function is stored in
> the file with the following header:
>
> #include <armadillo>
> #include <RcppArmadillo.h>
> using namespace arma;
> using namespace Rcpp;
>

I don't think #include <armadillo> is required. I also don't know C++ 
well enough to know how conflicting symbols in different namespaces are 
resolved. My suggestion would be to remove #include <armadillo> and  and 
to fully qualify symbols arma::colvec, etc in norm_laplacian.

I should have added --vanilla to the incantation

$ R --vanilla -d valgrind -f norm_laplacian.R

Using gdb seems more intimidating than it actually is. One would need to 
make sure ~/.R/Makevars had lines

CFLAGS = -g -O0
CXXFLAGS = -g -O0

install RcppArmadillo and SC3 (the compiler should report flags -g -O0) 
and to confirm that the segfault still occurs (it might not -- it could 
be the result of a subtle bug introduced by optimization). One would then

     R --vanilla -d gdb
     (gdb) r
     > library(SC3)
     > SC3::norm_laplacian(matrix(runif(100), nrow = 10))

probably ending up in the debugger where you could do

     (gdb) up

to go up the call stack unitil you see something like

(gdb) up
#1  0x00007fffe4118ac5 in arma::Mat<double>::max (this=0x7fffffffb490)
     at 
/home/mtmorgan/R/x86_64-pc-linux-gnu-library/3.4-Bioc-3.5/RcppArmadillo/include/armadillo_bits/Mat_meat.hpp:6801
6801	  return op_max::direct_max(memptr(), n_elem);

then down to the line that fails

(gdb) do
#0  arma::op_max::direct_max<double> (X=0xfc73dc0, n_elem=100)
     at 
/home/mtmorgan/R/x86_64-pc-linux-gnu-library/3.4-Bioc-3.5/RcppArmadillo/include/armadillo_bits/op_max_meat.hpp:362
362	    const eT X_i = X[i];

then take a peak at the source code

(gdb) l
357	  eT max_val = priv::most_neg<eT>();
358	
359	  uword i,j;
360	  for(i=0, j=1; j<n_elem; i+=2, j+=2)
361	    {
362	    const eT X_i = X[i];
363	    const eT X_j = X[j];
364	
365	    if(X_i > max_val) { max_val = X_i; }
366	    if(X_j > max_val) { max_val = X_j; }

and check out things to make sure that they make sense

(gdb) p n_elem
$1 = 100
(gdb) p i
$2 = 0
(gdb) p j
$3 = 1
(gdb) call X[i]
$4 = 0.90036606369540095

etc. One might go up the call stack, verifying along the way

(gdb) up
#1  0x00007fffe4118ac5 in arma::Mat<double>::max (this=0x7fffffffb490)
     at 
/home/mtmorgan/R/x86_64-pc-linux-gnu-library/3.4-Bioc-3.5/RcppArmadillo/include/armadillo_bits/Mat_meat.hpp:6801
6801	  return op_max::direct_max(memptr(), n_elem);
(gdb)
#2  0x00007fffe4116cae in norm_laplacian (A=...) at cppFunctions.cpp:86
86	    A = exp(-A/A.max());
(gdb) print A
$6 = {<arma::Base<double, arma::Mat<double> >> = 
{<arma::Base_inv_yes<arma::Mat<double> >> = {<No data fields>}, 
<arma::Base_eval_Mat<double, arma::Mat<double> >> = {<No data fields>}, 
<arma::Base_trans_default<arma::Mat<double> >> = {<No data fields>}, <No 
data fields>}, n_rows = 10, n_cols = 10, n_elem = 100,
   vec_state = 0, mem_state = 0, mem = 0xfc73dc0, mem_local = {
     6.9533484229745318e-310, 6.9533490667683032e-310, 
6.8279872255260272e-321,
     6.9533484229745318e-310, 6.9533491664832504e-310, 
6.9533558068923271e-310,
     6.9533558068921294e-310, 6.9533490666871776e-310, 
4.9406564584124654e-324,
     -3.9321179452085543e+234, 6.9533480320859982e-310,
     1.1680649801909133e-315, 6.9533558069148565e-310, 
6.9533487849390297e-310,
     1.1680649801909133e-315, 6.0259171035421809e-317},
   static is_col = <optimized out>, static is_row = <optimized out>}

My next stop would be the Rcpp mailing list 
http://lists.r-forge.r-project.org/mailman/listinfo/rcpp-devel. They 
will likely either have something useful to say or be reluctant to 
install the 80+ packages that SC3 depends on.

Hope that helps somehow...

Martin

>
> Many thanks in advance,
> Kind regards,
> Vladimir
>
> On Thu, Jan 5, 2017 at 12:19 PM Martin Morgan
> <martin.morgan at roswellpark.org <mailto:martin.morgan at roswellpark.org>>
> wrote:
>
>     On 01/05/2017 06:41 AM, Vladimir Kiselev wrote:
>     > My package (SC3 -
>     http://bioconductor.org/packages/3.4/bioc/html/SC3.html)
>     > has a function that causes R version/platform-dependent seqfault.
>     Here is
>     > the function (it's in C++ using RccpArmadillo):
>     >
>     > arma::mat norm_laplacian(arma::mat A) {
>     >     A = exp(-A/A.max());
>     >     arma::rowvec D_row = pow(sum(A), -0.5);
>     >     A.each_row() %= D_row;
>     >     colvec D_col = conv_to< colvec >::from(D_row);
>     >     A.each_col() %= D_col;
>     >     arma::mat res = eye(A.n_cols, A.n_cols) - A;
>     >     return(res);
>     > }
>     >
>     > The test code that provides a segfault on some R versions/platforms:
>     > SC3::norm_laplacian(matrix(runif(100), nrow = 10))
>
>     The first line of attack is to simplify the problem as much as possible.
>     I did this by writing a C++ file norm_laplacian.cpp
>
>     #include <RcppArmadillo.h>
>
>     using namespace arma;
>
>     // [[Rcpp::depends(RcppArmadillo)]]
>
>     // [[Rcpp::export]]
>     arma::mat norm_laplacian(arma::mat A) {
>          A = exp(-A/A.max());
>          arma::rowvec D_row = pow(sum(A), -0.5);
>          A.each_row() %= D_row;
>          colvec D_col = conv_to< colvec >::from(D_row);
>          A.each_col() %= D_col;
>          arma::mat res = eye(A.n_cols, A.n_cols) - A;
>          return(res);
>     }
>
>     and then in R, e.g., norm_laplacian.R
>
>          library(Rcpp)
>          sourceCpp("norm_laplacian.cpp", showOutput=TRUE)
>          xx <- norm_laplacian(matrix(runif(100), nrow = 10))
>          sessionInfo()
>
>     It would be helpful to use set.seed() to make the example more
>     reproducible. One would hope that
>
>          R -f norm_laplacian.R
>
>     would produce a segfault. Unfortunately not for me. My next step was to
>     run this code under valgrind to look for invalid memory access
>
>          R -d valgrind -f norm_laplacian.R
>
>     again hoping for a report of 'invalid write' or 'invalid read', but
>     again no luck for me.
>
>     You could see if your collaborators are able to generate segfaults with
>     this simpler code. If R -f norm_laplacian.R is sufficient, the next step
>     would be to run it under a C-level debugger like gdb, with some hints at
>     http://bioconductor.org/developers/how-to/c-debugging/
>
>     Here's my output; it's also useful to know information about the
>     compiler, and to pay attention to the compiler options (especially
>     optimization level -O0 for me)
>
>     $ g++ --version|head -n1
>     g++ (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
>
>     $ R --vanilla -f norm_laplacian.R
>      > library(Rcpp)
>      > sourceCpp("norm_laplacian.cpp", showOutput=TRUE)
>     /home/mtmorgan/bin/R-devel/bin/R CMD SHLIB -o 'sourceCpp_2.so'
>     'norm_laplacian.cpp'
>     g++  -I/home/mtmorgan/bin/R-devel/include -DNDEBUG  -I/usr/local/include
>
>     -I"/home/mtmorgan/R/x86_64-pc-linux-gnu-library/3.4-Bioc-3.5/Rcpp/include"
>     -I"/home/mtmorgan/R/x86_64-pc-linux-gnu-library/3.4-Bioc-3.5/RcppArmadillo/include"
>     -I"/tmp"   -fpic  -g -O0 -c norm_laplacian.cpp -o norm_laplacian.o
>     g++ -shared -L/home/mtmorgan/bin/R-devel/lib -L/usr/local/lib -o
>     sourceCpp_2.so norm_laplacian.o -L/home/mtmorgan/bin/R-devel/lib
>     -lRlapack -L/home/mtmorgan/bin/R-devel/lib -lRblas -lgfortran -lm
>     -lquadmath -L/home/mtmorgan/bin/R-devel/lib -lR
>      > xx <- norm_laplacian(matrix(runif(100), nrow = 10))
>      > sessionInfo()
>     R Under development (unstable) (2016-12-20 r71827)
>     Platform: x86_64-pc-linux-gnu (64-bit)
>     Running under: Ubuntu 16.04.1 LTS
>
>     locale:
>       [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>       [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>       [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>       [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>       [9] LC_ADDRESS=C               LC_TELEPHONE=C
>     [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>     attached base packages:
>     [1] stats     graphics  grDevices utils     datasets  methods   base
>
>     other attached packages:
>     [1] Rcpp_0.12.8.3
>
>     loaded via a namespace (and not attached):
>     [1] compiler_3.4.0            tools_3.4.0
>     [3] RcppArmadillo_0.7.600.1.0
>      >
>
>
>     if the segfault does not occur with the simpler code, then one could try
>     gdb / valgrind with SC3::norm_laplacian(matrix(runif(100), nrow = 10))
>
>     Martin
>
>     >
>     > The segfault usually looks like this:
>     > *** caught segfault ***
>     > address 0x7ffdc981e000, cause 'memory not mapped'
>     >
>     > (where address can be a different sequence)
>     >
>     > So far by a collaborative effort (me and some users of the package) we
>     > figured out configurations that cause or do not cause a segfault:
>     >
>     > * Configurations causing a segfault:
>     >
>     > R version 3.3.2 (2016-10-31)
>     > Platform: x86_64-pc-linux-gnu (64-bit)
>     > Running under: Arch Linux
>     >
>     > R version 3.3.2 (2016-10-31)
>     > Platform: x86_64-pc-linux-gnu (64-bit)
>     > Running under: Ubuntu 16.10
>     >
>     > * Configurations causing no segfault:
>     >
>     > R version 3.3.2 (2016-10-31)
>     > Platform: x86_64-pc-linux-gnu (64-bit)
>     > Running under: Ubuntu 16.04.1 LTS
>     >
>     > R version 3.3.1 (2016-06-21)
>     > Platform: x86_64-pc-linux-gnu (64-bit)
>     > Running under: Ubuntu 14.04.5 LTS
>     >
>     > R version 3.3.0 (2016-05-03)
>     > Platform: x86_64-pc-linux-gnu (64-bit)
>     > Running under: Ubuntu precise (12.04.5 LTS)
>     >
>     > R Under development (unstable) (2016-10-20 r71540)
>     > Platform: x86_64-apple-darwin13.4.0 (64-bit)
>     > Running under: OS X Yosemite 10.10.5
>     >
>     > More details on our discussion can be found here:
>     > https://github.com/hemberg-lab/SC3/issues/33
>     >
>     > Has anybody had a similar issue? Do you have any suggestions on
>     how to fix
>     > this, except rewriting the function in R? Or maybe there already
>     exists a
>     > normalised Laplacian function written in C++?
>     >
>     > Many thanks,
>     > Cheers,
>     > Vladimir
>     >
>
>
>     This email message may contain legally privileged and/or
>     confidential information.  If you are not the intended recipient(s),
>     or the employee or agent responsible for the delivery of this
>     message to the intended recipient(s), you are hereby notified that
>     any disclosure, copying, distribution, or use of this email message
>     is prohibited.  If you have received this message in error, please
>     notify the sender immediately by e-mail and delete this email
>     message from your computer. Thank you.
>
> --
> http://genat.uk

This email message may contain legally privileged and/or...{{dropped:2}}