[Rd] memory leak in sub("[range]",...)
Martin Maechler
maechler at stat.math.ethz.ch
Tue Jul 15 10:11:06 CEST 2008
>>>>> "BD" == Bill Dunlap <bill at insightful.com>
>>>>> on Wed, 9 Jul 2008 11:26:50 -0700 (PDT) writes:
BD> There is a 2-block memory leak in the sub() (or any other regex-related
BD> function, probably) when the pattern argument involves a range
BD> expression, e.g., '[0-9]'.
BD> % R --debugger=valgrind --debugger-args=--leak-check=full --vanilla
BD> ==14519== Memcheck, a memory error detector.
BD> ==14519== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
BD> ==14519== Using LibVEX rev 1658, a library for dynamic binary translation.
BD> ==14519== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
BD> ==14519== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
BD> ==14519== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
BD> ==14519== For more details, rerun with: -v
BD> ==14519==
BD> R version 2.8.0 Under development (unstable) (2008-07-07 r46046)
BD> ...
>> for(i in 1:1000)sub("[a-c]","+","0abcd")
>> q()
BD> ==32503==
BD> ==32503== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 40 from 2)
BD> ==32503== malloc/free: in use at exit: 12,603,409 bytes in 7,915 blocks.
BD> ==32503== malloc/free: 61,973 allocs, 54,058 frees, 54,494,371 bytes
BD> allocated.
BD> ==32503== For counts of detected errors, rerun with: -v
BD> ==32503== searching for pointers to 7,915 not-freed blocks.
BD> ==32503== checked 12,616,568 bytes.
BD> ==32503==
BD> ==32503== 4 bytes in 1 blocks are possibly lost in loss record 1 of 45
BD> ==32503== at 0x40046EE: malloc (vg_replace_malloc.c:149)
BD> ==32503== by 0x4005B9A: realloc (vg_replace_malloc.c:306)
BD> ==32503== by 0x80A5F92: parse_expression (regex.c:5202)
BD> ==32503== by 0x80A614F: parse_branch (regex.c:4707)
BD> ==32503== by 0x80A621A: parse_reg_exp (regex.c:4666)
BD> ==32503== by 0x80A6618: Rf_regcomp (regex.c:4635)
BD> ==32503== by 0x8110CB4: do_gsub (character.c:1355)
BD> ==32503== by 0x80654A4: do_internal (names.c:1135)
BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461)
BD> ==32503== by 0x8160DA7: do_begin (eval.c:1174)
BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461)
BD> ==32503== by 0x8162210: Rf_applyClosure (eval.c:667)
BD> ==32503==
BD> ... ignore 85 byte/4 block leak in readline ...
BD> ==32503== 7,980 bytes in 1,995 blocks are definitely lost in loss record 36 of
BD> 45
BD> ==32503== at 0x40046EE: malloc (vg_replace_malloc.c:149)
BD> ==32503== by 0x4005B9A: realloc (vg_replace_malloc.c:306)
BD> ==32503== by 0x80A5F92: parse_expression (regex.c:5202)
BD> ==32503== by 0x80A614F: parse_branch (regex.c:4707)
BD> ==32503== by 0x80A621A: parse_reg_exp (regex.c:4666)
BD> ==32503== by 0x80A6618: Rf_regcomp (regex.c:4635)
BD> ==32503== by 0x8110CB4: do_gsub (character.c:1355)
BD> ==32503== by 0x80654A4: do_internal (names.c:1135)
BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461)
BD> ==32503== by 0x8160DA7: do_begin (eval.c:1174)
BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461)
BD> ==32503== by 0x8162210: Rf_applyClosure (eval.c:667)
BD> The leaked blocks are allocated in iinternal_function build_range_exp() at
BD> 5200 /* Use realloc since mbcset->range_starts and mbcset-> range_ends
BD> 5201 are NULL if *range_alloc == 0. */
BD> 5202 new_array_start = re_realloc (mbcset->range_starts,
BD> wchar_t,
BD> 5203 new_nranges);
BD> 5204 new_array_end = re_realloc (mbcset->range_ends, wchar_t,
BD> 5205 new_nranges);
BD> ...
BD> 5210 mbcset->range_starts = new_array_start;
BD> 5211 mbcset->range_ends = new_array_end;
BD> This file, src/main/regex.c, contains a complicated mess of #ifdef's
((note that these were not
BD> but range_starts and range_ends are defined and appear to be used
BD> whether or not _LIBC is defined. However, they are only freed if _LIBC
BD> is defined. In my setup (Linux, gcc 3.4.5) _LIBC is not defined so
BD> they don't get freed.
Ok; this all makes sense; I've seen the same in the source
Interestingly, my newer setup (Linux, gcc 4.2.x ...) does not show the
memory leak; I've not checked if it's because _LIBC is defined
or for another reason.
I'm applying your patch --- thank you, Bill.
Martin
BD> After the following change in free_charset() only the 85 byte/4 block
BD> leak in readline remains.
BD> Index: regex.c
BD> ===================================================================
BD> --- regex.c (revision 46046)
BD> +++ regex.c (working copy)
BD> @@ -6240,9 +6240,9 @@
BD> # ifdef _LIBC
BD> re_free (cset->coll_syms);
BD> re_free (cset->equiv_classes);
BD> +# endif
BD> re_free (cset->range_starts);
BD> re_free (cset->range_ends);
BD> -# endif
BD> re_free (cset->char_classes);
BD> re_free (cset);
BD> }
BD> [This report may be a duplicate: I tried submitting it via the form in
BD> http://bugs.r-project.org/cgi-bin/R, but I cannot find it there now.]
neither do I.
The machine running the repository had a (announce by Peter
Dalgaard) downtime a couple of days ago, so this may be related.
BD> ----------------------------------------------------------------------------
BD> Bill Dunlap
BD> Insightful Corporation
BD> bill at insightful dot com
More information about the R-devel
mailing list