[Rd] memory leak in sub("[range]",...)

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Jul 29 15:06:45 CEST 2008


As a belated follow-up (I was away at the time), note that in general we 
don't tamper with code we have ported from other projects as it makes 
future maintenance so much more difficult.  At the very least, we need 
conspicuous comments to ensure that such changes do not get lost (I've 
just added one).

On Tue, 15 Jul 2008, Martin Maechler wrote:

>>>>>> "BD" == Bill Dunlap <bill at insightful.com>
>>>>>>     on Wed, 9 Jul 2008 11:26:50 -0700 (PDT) writes:
>
>    BD> There is a 2-block memory leak in the sub() (or any other regex-related
>    BD> function, probably) when the pattern argument involves a range
>    BD> expression, e.g., '[0-9]'.
>
>    BD> % R --debugger=valgrind --debugger-args=--leak-check=full --vanilla
>    BD> ==14519== Memcheck, a memory error detector.
>    BD> ==14519== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
>    BD> ==14519== Using LibVEX rev 1658, a library for dynamic binary translation.
>    BD> ==14519== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
>    BD> ==14519== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
>    BD> ==14519== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
>    BD> ==14519== For more details, rerun with: -v
>    BD> ==14519==
>
>    BD> R version 2.8.0 Under development (unstable) (2008-07-07 r46046)
>    BD> ...
>    >> for(i in 1:1000)sub("[a-c]","+","0abcd")
>    >> q()
>    BD> ==32503==
>    BD> ==32503== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 40 from 2)
>    BD> ==32503== malloc/free: in use at exit: 12,603,409 bytes in 7,915 blocks.
>    BD> ==32503== malloc/free: 61,973 allocs, 54,058 frees, 54,494,371 bytes
>    BD> allocated.
>    BD> ==32503== For counts of detected errors, rerun with: -v
>    BD> ==32503== searching for pointers to 7,915 not-freed blocks.
>    BD> ==32503== checked 12,616,568 bytes.
>    BD> ==32503==
>    BD> ==32503== 4 bytes in 1 blocks are possibly lost in loss record 1 of 45
>    BD> ==32503==    at 0x40046EE: malloc (vg_replace_malloc.c:149)
>    BD> ==32503==    by 0x4005B9A: realloc (vg_replace_malloc.c:306)
>    BD> ==32503==    by 0x80A5F92: parse_expression (regex.c:5202)
>    BD> ==32503==    by 0x80A614F: parse_branch (regex.c:4707)
>    BD> ==32503==    by 0x80A621A: parse_reg_exp (regex.c:4666)
>    BD> ==32503==    by 0x80A6618: Rf_regcomp (regex.c:4635)
>    BD> ==32503==    by 0x8110CB4: do_gsub (character.c:1355)
>    BD> ==32503==    by 0x80654A4: do_internal (names.c:1135)
>    BD> ==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
>    BD> ==32503==    by 0x8160DA7: do_begin (eval.c:1174)
>    BD> ==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
>    BD> ==32503==    by 0x8162210: Rf_applyClosure (eval.c:667)
>    BD> ==32503==
>    BD> ... ignore 85 byte/4 block leak in readline ...
>    BD> ==32503== 7,980 bytes in 1,995 blocks are definitely lost in loss record 36 of
>    BD> 45
>    BD> ==32503==    at 0x40046EE: malloc (vg_replace_malloc.c:149)
>    BD> ==32503==    by 0x4005B9A: realloc (vg_replace_malloc.c:306)
>    BD> ==32503==    by 0x80A5F92: parse_expression (regex.c:5202)
>    BD> ==32503==    by 0x80A614F: parse_branch (regex.c:4707)
>    BD> ==32503==    by 0x80A621A: parse_reg_exp (regex.c:4666)
>    BD> ==32503==    by 0x80A6618: Rf_regcomp (regex.c:4635)
>    BD> ==32503==    by 0x8110CB4: do_gsub (character.c:1355)
>    BD> ==32503==    by 0x80654A4: do_internal (names.c:1135)
>    BD> ==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
>    BD> ==32503==    by 0x8160DA7: do_begin (eval.c:1174)
>    BD> ==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
>    BD> ==32503==    by 0x8162210: Rf_applyClosure (eval.c:667)
>
>    BD> The leaked blocks are allocated in iinternal_function build_range_exp() at
>    BD> 5200             /* Use realloc since mbcset->range_starts and mbcset-> range_ends
>    BD> 5201                are NULL if *range_alloc == 0.  */
>    BD> 5202             new_array_start = re_realloc (mbcset->range_starts,
>    BD> wchar_t,
>    BD> 5203                                           new_nranges);
>    BD> 5204             new_array_end = re_realloc (mbcset->range_ends, wchar_t,
>    BD> 5205                                         new_nranges);
>    BD> ...
>    BD> 5210             mbcset->range_starts = new_array_start;
>    BD> 5211             mbcset->range_ends = new_array_end;
>
>    BD> This file, src/main/regex.c, contains a complicated mess of #ifdef's
>
> ((note that these were not
>
>    BD> but range_starts and range_ends are defined and appear to be used
>    BD> whether or not _LIBC is defined.  However, they are only freed if _LIBC
>    BD> is defined.  In my setup (Linux, gcc 3.4.5) _LIBC is not defined so
>    BD> they don't get freed.
>
> Ok; this all makes sense; I've seen the same in the source
>
> Interestingly, my newer setup (Linux, gcc 4.2.x ...) does not show the
> memory leak; I've not checked if it's because _LIBC is defined
> or for another reason.
>
> I'm applying your patch ---  thank you, Bill.
> Martin
>
>    BD> After the following change in free_charset() only the 85 byte/4 block
>    BD> leak in readline remains.
>
>    BD> Index: regex.c
>    BD> ===================================================================
>    BD> --- regex.c     (revision 46046)
>    BD> +++ regex.c     (working copy)
>    BD> @@ -6240,9 +6240,9 @@
>    BD> # ifdef _LIBC
>    BD> re_free (cset->coll_syms);
>    BD> re_free (cset->equiv_classes);
>    BD> +# endif
>    BD> re_free (cset->range_starts);
>    BD> re_free (cset->range_ends);
>    BD> -# endif
>    BD> re_free (cset->char_classes);
>    BD> re_free (cset);
>    BD> }
>
>    BD> [This report may be a duplicate: I tried submitting it via the form in
>    BD> http://bugs.r-project.org/cgi-bin/R, but I cannot find it there now.]
>
> neither do I.
> The machine running the repository had a (announce by Peter
> Dalgaard) downtime a couple of days ago, so this may be related.
>
>
>    BD> ----------------------------------------------------------------------------
>    BD> Bill Dunlap
>    BD> Insightful Corporation
>    BD> bill at insightful dot com
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list