[Bioc-devel] [BioC] Rsubread crashes in 32bit linux

Wei Shi shi at wehi.EDU.AU
Thu Jun 7 04:55:59 CEST 2012


Dear Robert and Martin,

Thanks for your comments and suggestions.

I agree that Calloc/Free are better than calloc/free in the memory management. But because we are maintaining both R version and C version of our subread aligner package, we are inclined to use only one set of memory functions, rather than using both for catering for R and C respectively. 

We will try valgrind to examine if there is any memory leakage with our function.

I had a close look at the functions which we added to Rsubread in the last 8 months (after Rsubread version 1.1.1), and found that some functions declared large global static objects (one of the objects has a size of greater than 2GB). We suspect that this could be causing the problem to compilation, library loading and running of functions of such as buldindex(), especially for 32bit machines because R only has access to ~3GB of memory. We can now reproduce Dan's compilation problem (system frozen and large memory usage). What we are going to do is to replace these static memory usage to dynamic memory request and we will come up with fixes in a couple of days. Hope that will solve these problems.

Cheers,
Wei

On Jun 7, 2012, at 2:02 AM, Martin Morgan wrote:

> On 06/06/2012 06:04 AM, Robert Castelo wrote:
>> dear Wei,
>> 
>> thanks for getting quick into this problem, i have a suggestion about
>> something you say below but since its very technical i've decided to
>> move this particular thread to bioc-devel just in case other more
>> experience developers can help, please keep reading.
>> 
>> On 06/06/2012 12:10 PM, Wei Shi wrote:
>>> Dear Dan,
>>> 
>>> It didn't seem to be problem of requesting a continuous 1GB block in our
>>> investigation. We tracked the memory usage of buildindex() function when
>>> running it on yeast genome using a 32-bit VM, and found that the segfault
>>> happened right after a request of a few KB of memory was sent to the
>>> system when the memory parameter was set to 2500. However, the problem
>>> was
>>> gone when the memory parameter was changed to 1000.
>> 
>> i understand from what you write here that the fact that the problem may
>> be gone by changing the memory parameter to 1000, does not explain the
>> underlying issue that crashes the software.
>> 
>> in my experience, this kind of obscure correlations of behaviour occur
>> due to memory leaks elsewhere in the code which in general are very
>> difficult identify without using a memory profiling tool.
>> 
>> in case you have not done that yet, i'd recommend you to give it a try
>> using valgrind. i've taking the liberty of doing it myself in 64bit
>> linux, i.e., where the package does not crash.
>> 
>> i think it points out to some problem in the code. if you want to
>> reproduce this please put into a file called test.R the following code
>> by Dan which reproduces the problem using the latest R and Rsubread
>> versions:
>> 
>> =========================test.R==================
>> library(Rsubread)
>> ref <- system.file("extdata","reference.fa",package="Rsubread")
>> path <- system.file("extdata",package="Rsubread")
>> buildindex(basename=file.path(path,"reference_index"),reference=ref)
>> =================================================
>> 
>> and then call valgrind as follows from the shell of a linux box (it will
>> take several minutes):
>> 
>> $ R -d "valgrind --tool=memcheck --leak-check=yes --show-reachable=yes
>> --track-origins=yes" --vanilla < test.R &> test.out
>> 
>> (notice that this is a single line on the shell, but the email software
>> my break that line)
>> 
>> the output dumped in test.out is quite long but i found the following part:
>> 
>> ==13386== Syscall param write(buf) points to uninitialised byte(s)
>> ==13386== at 0x36B8AD3A00: __write_nocancel (in /lib64/libc-2.11.2.so)
>> ==13386== by 0x36B8A70FC2: _IO_file_write@@GLIBC_2.2.5 (in
>> /lib64/libc-2.11.2.so)
>> ==13386== by 0x36B8A70E89: _IO_file_xsputn@@GLIBC_2.2.5 (in
>> /lib64/libc-2.11.2.so)
>> ==13386== by 0x36B8A6707C: fwrite (in /lib64/libc-2.11.2.so)
>> ==13386== by 0x390464E3: gvindex_dump (gene-value-index.c:103)
>> ==13386== by 0x3904760E: build_gene_index (index-builder.c:135)
>> ==13386== by 0x390481C1: main_buildindex (index-builder.c:703)
>> ==13386== by 0x3902CE94: R_buildindex_wrapper (R_wrapper.c:31)
>> ==13386== by 0x4CB4C10: do_dotCode (dotcode.c:1687)
>> ==13386== by 0x4CE487A: Rf_eval (eval.c:492)
>> ==13386== by 0x4CEB45F: do_set (eval.c:1711)
>> ==13386== by 0x4CE469A: Rf_eval (eval.c:466)
>> ==13386== Address 0x1e80eec54 is 5,188 bytes inside a block of size
>> 461,615,034 alloc'd
>> ==13386== at 0x4A0515D: malloc (vg_replace_malloc.c:195)
>> ==13386== by 0x390463B6: gvindex_init (gene-value-index.c:14)
>> ==13386== by 0x39047C08: build_gene_index (index-builder.c:87)
>> ==13386== by 0x390481C1: main_buildindex (index-builder.c:703)
>> ==13386== by 0x3902CE94: R_buildindex_wrapper (R_wrapper.c:31)
>> ==13386== by 0x4CB4C10: do_dotCode (dotcode.c:1687)
>> ==13386== by 0x4CE487A: Rf_eval (eval.c:492)
>> ==13386== by 0x4CEB45F: do_set (eval.c:1711)
>> ==13386== by 0x4CE469A: Rf_eval (eval.c:466)
>> ==13386== by 0x4CE6ABF: do_begin (eval.c:1409)
>> ==13386== by 0x4CE469A: Rf_eval (eval.c:466)
>> ==13386== by 0x4CE7A02: Rf_applyClosure (eval.c:855)
>> ==13386== Uninitialised value was created by a heap allocation
>> ==13386== at 0x4A0515D: malloc (vg_replace_malloc.c:195)
>> ==13386== by 0x390463B6: gvindex_init (gene-value-index.c:14)
>> ==13386== by 0x39047C08: build_gene_index (index-builder.c:87)
>> ==13386== by 0x390481C1: main_buildindex (index-builder.c:703)
>> ==13386== by 0x3902CE94: R_buildindex_wrapper (R_wrapper.c:31)
>> ==13386== by 0x4CB4C10: do_dotCode (dotcode.c:1687)
>> ==13386== by 0x4CE487A: Rf_eval (eval.c:492)
>> ==13386== by 0x4CEB45F: do_set (eval.c:1711)
>> ==13386== by 0x4CE469A: Rf_eval (eval.c:466)
>> ==13386== by 0x4CE6ABF: do_begin (eval.c:1409)
>> ==13386== by 0x4CE469A: Rf_eval (eval.c:466)
>> ==13386== by 0x4CE7A02: Rf_applyClosure (eval.c:855)
>> ==13386==
>> ==13386== Warning: set address range perms: large range [0x1e80ed800,
>> 0x2039287da) (noaccess)
>> 
>> which seems to point to some use of uninitialised values in line 103 of
>> the build index-builder.c through a pointer to memory allocated in line
>> 87 of index-builder.c
>> 
>> i must confess i'm not looking to the source code myself just trying to
>> interpret directly the valgrind output, but this looks definitely leaky
>> to me. let me know if you and your collaborators have difficulty to spot
>> the problem in your code and i'll try to find a moment to look into it.
>> 
>> another issue, in which others in this list may have a more concrete
>> advice, is that i'd say that you should replace all the calls to
>> malloc() by R_alloc() and if you have any call to calloc()/free() by
>> Calloc()/Free(). these are internal R wrappers for the allocation and
>> release of memory which i think make the package more friendly to R and
>> to the users when, for instance, main memory is exhausted.
> 
> R_alloc allocates 'transient' memory, which means that it is automatically freed by R when the C routine returns to R. This isn't necessarily a drop-in replacement for malloc, for instance you would NOT Free() memory allocated with R_alloc and it would not be a good choice if your C code were making many memory allocations (both for efficiency reasons and because you would likely be explicitly free'ing the memory as part of the overall memory management strategy).
> 
> Using Calloc / Free is definitely appropriate, and Calloc is a not-too-expensive replacement for malloc.
> 
> Martin
> 
>> 
>> cheers,
>> robert.
>> 
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> 
> -- 
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
> 
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793


______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioc-devel mailing list