[Bioc-devel] [BioC] Rsubread crashes in 32bit linux

Robert Castelo robert.castelo at upf.edu
Wed Jun 6 15:04:26 CEST 2012


dear Wei,

thanks for getting quick into this problem, i have a suggestion about 
something you say below but since its very technical i've decided to 
move this particular thread to bioc-devel just in case other more 
experience developers can help, please keep reading.

On 06/06/2012 12:10 PM, Wei Shi wrote:
> Dear Dan,
>
> It didn't seem to be problem of requesting a continuous 1GB block in our
> investigation. We tracked the memory usage of buildindex() function when
> running it on yeast genome using a 32-bit VM, and found that the segfault
> happened right after a request of a few KB of memory was sent to the
> system when the memory parameter was set to 2500. However, the problem was
> gone when the memory parameter was changed to 1000.

i understand from what you write here that the fact that the problem may 
be gone by changing the memory parameter to 1000, does not explain the 
underlying issue that crashes the software.

in my experience, this kind of obscure correlations of behaviour occur 
due to memory leaks elsewhere in the code which in general are very 
difficult identify without using a memory profiling tool.

in case you have not done that yet, i'd recommend you to give it a try 
using valgrind. i've taking the liberty of doing it myself in 64bit 
linux, i.e., where the package does not crash.

i think it points out to some problem in the code. if you want to 
reproduce this please put into a file called test.R the following code 
by Dan which reproduces the problem using the latest R and Rsubread 
versions:

=========================test.R==================
library(Rsubread)
ref <- system.file("extdata","reference.fa",package="Rsubread")
path <- system.file("extdata",package="Rsubread")
buildindex(basename=file.path(path,"reference_index"),reference=ref)
=================================================

and then call valgrind as follows from the shell of a linux box (it will 
take several minutes):

$ R -d "valgrind --tool=memcheck --leak-check=yes --show-reachable=yes 
--track-origins=yes" --vanilla < test.R &> test.out

(notice that this is a single line on the shell, but the email software 
my break that line)

the output dumped in test.out is quite long but i found the following part:

==13386== Syscall param write(buf) points to uninitialised byte(s)
==13386==    at 0x36B8AD3A00: __write_nocancel (in /lib64/libc-2.11.2.so)
==13386==    by 0x36B8A70FC2: _IO_file_write@@GLIBC_2.2.5 (in 
/lib64/libc-2.11.2.so)
==13386==    by 0x36B8A70E89: _IO_file_xsputn@@GLIBC_2.2.5 (in 
/lib64/libc-2.11.2.so)
==13386==    by 0x36B8A6707C: fwrite (in /lib64/libc-2.11.2.so)
==13386==    by 0x390464E3: gvindex_dump (gene-value-index.c:103)
==13386==    by 0x3904760E: build_gene_index (index-builder.c:135)
==13386==    by 0x390481C1: main_buildindex (index-builder.c:703)
==13386==    by 0x3902CE94: R_buildindex_wrapper (R_wrapper.c:31)
==13386==    by 0x4CB4C10: do_dotCode (dotcode.c:1687)
==13386==    by 0x4CE487A: Rf_eval (eval.c:492)
==13386==    by 0x4CEB45F: do_set (eval.c:1711)
==13386==    by 0x4CE469A: Rf_eval (eval.c:466)
==13386==  Address 0x1e80eec54 is 5,188 bytes inside a block of size 
461,615,034 alloc'd
==13386==    at 0x4A0515D: malloc (vg_replace_malloc.c:195)
==13386==    by 0x390463B6: gvindex_init (gene-value-index.c:14)
==13386==    by 0x39047C08: build_gene_index (index-builder.c:87)
==13386==    by 0x390481C1: main_buildindex (index-builder.c:703)
==13386==    by 0x3902CE94: R_buildindex_wrapper (R_wrapper.c:31)
==13386==    by 0x4CB4C10: do_dotCode (dotcode.c:1687)
==13386==    by 0x4CE487A: Rf_eval (eval.c:492)
==13386==    by 0x4CEB45F: do_set (eval.c:1711)
==13386==    by 0x4CE469A: Rf_eval (eval.c:466)
==13386==    by 0x4CE6ABF: do_begin (eval.c:1409)
==13386==    by 0x4CE469A: Rf_eval (eval.c:466)
==13386==    by 0x4CE7A02: Rf_applyClosure (eval.c:855)
==13386==  Uninitialised value was created by a heap allocation
==13386==    at 0x4A0515D: malloc (vg_replace_malloc.c:195)
==13386==    by 0x390463B6: gvindex_init (gene-value-index.c:14)
==13386==    by 0x39047C08: build_gene_index (index-builder.c:87)
==13386==    by 0x390481C1: main_buildindex (index-builder.c:703)
==13386==    by 0x3902CE94: R_buildindex_wrapper (R_wrapper.c:31)
==13386==    by 0x4CB4C10: do_dotCode (dotcode.c:1687)
==13386==    by 0x4CE487A: Rf_eval (eval.c:492)
==13386==    by 0x4CEB45F: do_set (eval.c:1711)
==13386==    by 0x4CE469A: Rf_eval (eval.c:466)
==13386==    by 0x4CE6ABF: do_begin (eval.c:1409)
==13386==    by 0x4CE469A: Rf_eval (eval.c:466)
==13386==    by 0x4CE7A02: Rf_applyClosure (eval.c:855)
==13386==
==13386== Warning: set address range perms: large range [0x1e80ed800, 
0x2039287da) (noaccess)

which seems to point to some use of uninitialised values in line 103 of 
the build index-builder.c through a pointer to memory allocated in line 
87 of index-builder.c

i must confess i'm not looking to the source code myself just trying to 
interpret directly the valgrind output, but this looks definitely leaky 
to me. let me know if you and your collaborators have difficulty to spot 
the problem in your code and i'll try to find a moment to look into it.

another issue, in which others in this list may have a more concrete 
advice, is that i'd say that you should replace all the calls to 
malloc() by R_alloc() and if you have any call to calloc()/free() by 
Calloc()/Free(). these are internal R wrappers for the allocation and 
release of memory which i think make the package more friendly to R and 
to the users when, for instance, main memory is exhausted.

cheers,
robert.



More information about the Bioc-devel mailing list