[Rd] memory leak in sub("[range]",...)
Bill Dunlap
bill at insightful.com
Wed Jul 9 20:26:50 CEST 2008
There is a 2-block memory leak in the sub() (or any other regex-related
function, probably) when the pattern argument involves a range
expression, e.g., '[0-9]'.
% R --debugger=valgrind --debugger-args=--leak-check=full --vanilla
==14519== Memcheck, a memory error detector.
==14519== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==14519== Using LibVEX rev 1658, a library for dynamic binary translation.
==14519== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==14519== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==14519== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==14519== For more details, rerun with: -v
==14519==
R version 2.8.0 Under development (unstable) (2008-07-07 r46046)
...
> for(i in 1:1000)sub("[a-c]","+","0abcd")
> q()
==32503==
==32503== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 40 from 2)
==32503== malloc/free: in use at exit: 12,603,409 bytes in 7,915 blocks.
==32503== malloc/free: 61,973 allocs, 54,058 frees, 54,494,371 bytes
allocated.
==32503== For counts of detected errors, rerun with: -v
==32503== searching for pointers to 7,915 not-freed blocks.
==32503== checked 12,616,568 bytes.
==32503==
==32503== 4 bytes in 1 blocks are possibly lost in loss record 1 of 45
==32503== at 0x40046EE: malloc (vg_replace_malloc.c:149)
==32503== by 0x4005B9A: realloc (vg_replace_malloc.c:306)
==32503== by 0x80A5F92: parse_expression (regex.c:5202)
==32503== by 0x80A614F: parse_branch (regex.c:4707)
==32503== by 0x80A621A: parse_reg_exp (regex.c:4666)
==32503== by 0x80A6618: Rf_regcomp (regex.c:4635)
==32503== by 0x8110CB4: do_gsub (character.c:1355)
==32503== by 0x80654A4: do_internal (names.c:1135)
==32503== by 0x815F0EB: Rf_eval (eval.c:461)
==32503== by 0x8160DA7: do_begin (eval.c:1174)
==32503== by 0x815F0EB: Rf_eval (eval.c:461)
==32503== by 0x8162210: Rf_applyClosure (eval.c:667)
==32503==
... ignore 85 byte/4 block leak in readline ...
==32503== 7,980 bytes in 1,995 blocks are definitely lost in loss record 36 of
45
==32503== at 0x40046EE: malloc (vg_replace_malloc.c:149)
==32503== by 0x4005B9A: realloc (vg_replace_malloc.c:306)
==32503== by 0x80A5F92: parse_expression (regex.c:5202)
==32503== by 0x80A614F: parse_branch (regex.c:4707)
==32503== by 0x80A621A: parse_reg_exp (regex.c:4666)
==32503== by 0x80A6618: Rf_regcomp (regex.c:4635)
==32503== by 0x8110CB4: do_gsub (character.c:1355)
==32503== by 0x80654A4: do_internal (names.c:1135)
==32503== by 0x815F0EB: Rf_eval (eval.c:461)
==32503== by 0x8160DA7: do_begin (eval.c:1174)
==32503== by 0x815F0EB: Rf_eval (eval.c:461)
==32503== by 0x8162210: Rf_applyClosure (eval.c:667)
The leaked blocks are allocated in iinternal_function build_range_exp() at
5200 /* Use realloc since mbcset->range_starts and
mbcset->range_ends
5201 are NULL if *range_alloc == 0. */
5202 new_array_start = re_realloc (mbcset->range_starts,
wchar_t,
5203 new_nranges);
5204 new_array_end = re_realloc (mbcset->range_ends, wchar_t,
5205 new_nranges);
...
5210 mbcset->range_starts = new_array_start;
5211 mbcset->range_ends = new_array_end;
This file, src/main/regex.c, contains a complicated mess of #ifdef's
but range_starts and range_ends are defined and appear to be used
whether or not _LIBC is defined. However, they are only freed if _LIBC
is defined. In my setup (Linux, gcc 3.4.5) _LIBC is not defined so
they don't get freed.
After the following change in free_charset() only the 85 byte/4 block
leak in readline remains.
Index: regex.c
===================================================================
--- regex.c (revision 46046)
+++ regex.c (working copy)
@@ -6240,9 +6240,9 @@
# ifdef _LIBC
re_free (cset->coll_syms);
re_free (cset->equiv_classes);
+# endif
re_free (cset->range_starts);
re_free (cset->range_ends);
-# endif
re_free (cset->char_classes);
re_free (cset);
}
[This report may be a duplicate: I tried submitting it via the form in
http://bugs.r-project.org/cgi-bin/R, but I cannot find it there now.]
----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com
"All statements in this message represent the opinions of the author and do
not necessarily reflect Insightful Corporation policy or position."
More information about the R-devel
mailing list