[Bioc-devel] [BioC] Rsubread crashes in 32bit linux

Martin Morgan mtmorgan at fhcrc.org
Wed Jun 6 18:02:43 CEST 2012


On 06/06/2012 06:04 AM, Robert Castelo wrote:
> dear Wei,
>
> thanks for getting quick into this problem, i have a suggestion about
> something you say below but since its very technical i've decided to
> move this particular thread to bioc-devel just in case other more
> experience developers can help, please keep reading.
>
> On 06/06/2012 12:10 PM, Wei Shi wrote:
>> Dear Dan,
>>
>> It didn't seem to be problem of requesting a continuous 1GB block in our
>> investigation. We tracked the memory usage of buildindex() function when
>> running it on yeast genome using a 32-bit VM, and found that the segfault
>> happened right after a request of a few KB of memory was sent to the
>> system when the memory parameter was set to 2500. However, the problem
>> was
>> gone when the memory parameter was changed to 1000.
>
> i understand from what you write here that the fact that the problem may
> be gone by changing the memory parameter to 1000, does not explain the
> underlying issue that crashes the software.
>
> in my experience, this kind of obscure correlations of behaviour occur
> due to memory leaks elsewhere in the code which in general are very
> difficult identify without using a memory profiling tool.
>
> in case you have not done that yet, i'd recommend you to give it a try
> using valgrind. i've taking the liberty of doing it myself in 64bit
> linux, i.e., where the package does not crash.
>
> i think it points out to some problem in the code. if you want to
> reproduce this please put into a file called test.R the following code
> by Dan which reproduces the problem using the latest R and Rsubread
> versions:
>
> =========================test.R==================
> library(Rsubread)
> ref <- system.file("extdata","reference.fa",package="Rsubread")
> path <- system.file("extdata",package="Rsubread")
> buildindex(basename=file.path(path,"reference_index"),reference=ref)
> =================================================
>
> and then call valgrind as follows from the shell of a linux box (it will
> take several minutes):
>
> $ R -d "valgrind --tool=memcheck --leak-check=yes --show-reachable=yes
> --track-origins=yes" --vanilla < test.R &> test.out
>
> (notice that this is a single line on the shell, but the email software
> my break that line)
>
> the output dumped in test.out is quite long but i found the following part:
>
> ==13386== Syscall param write(buf) points to uninitialised byte(s)
> ==13386== at 0x36B8AD3A00: __write_nocancel (in /lib64/libc-2.11.2.so)
> ==13386== by 0x36B8A70FC2: _IO_file_write@@GLIBC_2.2.5 (in
> /lib64/libc-2.11.2.so)
> ==13386== by 0x36B8A70E89: _IO_file_xsputn@@GLIBC_2.2.5 (in
> /lib64/libc-2.11.2.so)
> ==13386== by 0x36B8A6707C: fwrite (in /lib64/libc-2.11.2.so)
> ==13386== by 0x390464E3: gvindex_dump (gene-value-index.c:103)
> ==13386== by 0x3904760E: build_gene_index (index-builder.c:135)
> ==13386== by 0x390481C1: main_buildindex (index-builder.c:703)
> ==13386== by 0x3902CE94: R_buildindex_wrapper (R_wrapper.c:31)
> ==13386== by 0x4CB4C10: do_dotCode (dotcode.c:1687)
> ==13386== by 0x4CE487A: Rf_eval (eval.c:492)
> ==13386== by 0x4CEB45F: do_set (eval.c:1711)
> ==13386== by 0x4CE469A: Rf_eval (eval.c:466)
> ==13386== Address 0x1e80eec54 is 5,188 bytes inside a block of size
> 461,615,034 alloc'd
> ==13386== at 0x4A0515D: malloc (vg_replace_malloc.c:195)
> ==13386== by 0x390463B6: gvindex_init (gene-value-index.c:14)
> ==13386== by 0x39047C08: build_gene_index (index-builder.c:87)
> ==13386== by 0x390481C1: main_buildindex (index-builder.c:703)
> ==13386== by 0x3902CE94: R_buildindex_wrapper (R_wrapper.c:31)
> ==13386== by 0x4CB4C10: do_dotCode (dotcode.c:1687)
> ==13386== by 0x4CE487A: Rf_eval (eval.c:492)
> ==13386== by 0x4CEB45F: do_set (eval.c:1711)
> ==13386== by 0x4CE469A: Rf_eval (eval.c:466)
> ==13386== by 0x4CE6ABF: do_begin (eval.c:1409)
> ==13386== by 0x4CE469A: Rf_eval (eval.c:466)
> ==13386== by 0x4CE7A02: Rf_applyClosure (eval.c:855)
> ==13386== Uninitialised value was created by a heap allocation
> ==13386== at 0x4A0515D: malloc (vg_replace_malloc.c:195)
> ==13386== by 0x390463B6: gvindex_init (gene-value-index.c:14)
> ==13386== by 0x39047C08: build_gene_index (index-builder.c:87)
> ==13386== by 0x390481C1: main_buildindex (index-builder.c:703)
> ==13386== by 0x3902CE94: R_buildindex_wrapper (R_wrapper.c:31)
> ==13386== by 0x4CB4C10: do_dotCode (dotcode.c:1687)
> ==13386== by 0x4CE487A: Rf_eval (eval.c:492)
> ==13386== by 0x4CEB45F: do_set (eval.c:1711)
> ==13386== by 0x4CE469A: Rf_eval (eval.c:466)
> ==13386== by 0x4CE6ABF: do_begin (eval.c:1409)
> ==13386== by 0x4CE469A: Rf_eval (eval.c:466)
> ==13386== by 0x4CE7A02: Rf_applyClosure (eval.c:855)
> ==13386==
> ==13386== Warning: set address range perms: large range [0x1e80ed800,
> 0x2039287da) (noaccess)
>
> which seems to point to some use of uninitialised values in line 103 of
> the build index-builder.c through a pointer to memory allocated in line
> 87 of index-builder.c
>
> i must confess i'm not looking to the source code myself just trying to
> interpret directly the valgrind output, but this looks definitely leaky
> to me. let me know if you and your collaborators have difficulty to spot
> the problem in your code and i'll try to find a moment to look into it.
>
> another issue, in which others in this list may have a more concrete
> advice, is that i'd say that you should replace all the calls to
> malloc() by R_alloc() and if you have any call to calloc()/free() by
> Calloc()/Free(). these are internal R wrappers for the allocation and
> release of memory which i think make the package more friendly to R and
> to the users when, for instance, main memory is exhausted.

R_alloc allocates 'transient' memory, which means that it is 
automatically freed by R when the C routine returns to R. This isn't 
necessarily a drop-in replacement for malloc, for instance you would NOT 
Free() memory allocated with R_alloc and it would not be a good choice 
if your C code were making many memory allocations (both for efficiency 
reasons and because you would likely be explicitly free'ing the memory 
as part of the overall memory management strategy).

Using Calloc / Free is definitely appropriate, and Calloc is a 
not-too-expensive replacement for malloc.

Martin

>
> cheers,
> robert.
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list