[Rd] Using \u2030 in plot axis label -> stack smashing

Tue Sep 19 12:48:03 CEST 2006

On Tue, 2006-09-19 at 08:26 +0100, Prof Brian Ripley wrote:
> I didn't have access to my FC5 boxes yesterday (electrical testing).
> 
> This does need the FC5-specific compilation options set
> (-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
> --param=ssp-buffer-size=4), so it is not surprising it is not 
> reproducible elsewhere (including under valgrind, BTW).
> 
> Ei-ji's patch works (and is incorporated now), but the buffer is used at
> 
>      strncpy(s, buf, sizeof(buf) - 1); /* ensure 0-terminated */
> 
> and the 's' here should be big enough (\uxxxx can only expand to 3 bytes 
> in UTF-8, so "\u2030" is four bytes in UTF-8 including the null 
> terminator).  Can Ei-ji explain?
> 
> I can understand how Gavin saw this in the released FC5 RPM.  What I don't 
> understand is how he saw this in 2.4.0 alpha/R-devel without setting 
> non-default CFLAGS he did not tell us about.

Thanks Prof. Ripley and Ei-Ji. I should have mentioned that all the
versions I reported for were self-compiled, and I did so with the same
set of flags as the FC5 rpm. Will add that to the list in my head of
things to report.

> 
> BTW, just applying this patch will not work: you need to rebuild gram.c 
> in maintainer mode.

I'm not clear what you mean by maintainer mode - not something I have
come across before. If I update the local source on my machine from the
svn server, and make clean, configure and make again, will this be
sufficient? Or do I need to do something else?

Many thanks,

G

> 
> 
> On Tue, 19 Sep 2006, Ei-ji Nakama wrote:
> 
> > This seems to be the mine which I contrived. m(_|_)m
> >
> > --- R-alpha.orig/src/main/gram.y        2006-09-04 23:41:33.000000000 +0900
> > +++ R-alpha/src/main/gram.y     2006-09-19 13:01:41.000000000 +0900
> > @@ -99,11 +99,12 @@
> > # endif
> > #endif
> > #include <errno.h>
> > +#define MB_BUF 16
> >
> > static size_t ucstomb(char *s, wchar_t wc, mbstate_t *ps)
> > {
> >     char     tocode[128];
> > -    char     buf[16];
> > +    char     buf[MB_BUF];
> >     void    *cd = NULL ;
> >     wchar_t  wcs[2];
> >     char    *inbuf = (char *) wcs;
> > @@ -1709,7 +1710,7 @@
> >                error(_("\\uxxxx sequences not supported"));
> > #else
> >                wint_t val = 0; int i, ext; size_t res;
> > -               char buff[5]; Rboolean delim = FALSE;
> > +               char buff[MB_BUF]; Rboolean delim = FALSE;
> >                if((c = xxgetc()) == '{') delim = TRUE; else xxungetc(c);
> >                for(i = 0; i < 4; i++) {
> >                    c = xxgetc();
> > @@ -1743,7 +1744,7 @@
> > #ifdef SUPPORT_MBCS
> >                else {
> >                    wint_t val = 0; int i, ext; size_t res;
> > -                   char buff[9]; Rboolean delim = FALSE;
> > +                   char buff[MB_BUF]; Rboolean delim = FALSE;
> >                    if((c = xxgetc()) == '{') delim = TRUE; else xxungetc(c);
> >                    for(i = 0; i < 8; i++) {
> >                        c = xxgetc();
> >
> >
> > 2006/9/19, Gregor Gorjanc <gregor.gorjanc at bfro.uni-lj.si>:
> >> Gavin Simpson wrote:
> >>> On Mon, 2006-09-18 at 19:02 +0000, Gregor Gorjanc wrote:
> >>>> Gavin Simpson <gavin.simpson <at> ucl.ac.uk> writes:
> >>>>> Dear List
> >>>>>
> >>>>> I just noticed the following behaviour in R 2.3.1 Patched (2006-06-13
> >>>>> r38342) and confirmed similar behaviour in R 2.4.0 alpha (2006-09-18
> >>>>> r39383) & R 2.5.0 (2006-09-18 r39383) - which may actually be the same
> >>>>> thing?, that trying to plot the unicode character \u2030 (which should
> >>>>> be in a ¢ó [per mille] sign) in an axis label leads to the following
> >>>>> error:
> >>>>>
> >>>>> *** stack smashing detected ***: /home/gavin/R/R-devel/build/bin/exec/R
> >>>>> terminated
> >>>>> Aborted
> >>>>>
> >>>>> The simplest, reproducible example I have tried is:
> >>>>>
> >>>>> plot(1:10, ylab = "\u2030")
> >>>>>
> >>>> I can not reproduce this on my Debian GNU/Linux. I get something like "S
> >>>> for y label under 2.3.1 2006-06-01 and 2.5.0 2006-09-13 r39292 with the
> >>>> following locale
> >>>>
> >>>> [1] "LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;
> >>>> LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;
> >>>> LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;
> >>>> LC_IDENTIFICATION=C"
> >>>>
> >>>> It does not change if I set everything into en_GB.UTF-8. Is this valid
> >>>> unicode code?
> >>>>
> >>>> Gregor
> >>>
> >>> Cheers for the follow up Gregor,
> >>>
> >>> I was following advice given by Prof. Ripley in a posting on R-Help
> >>> about how to get the per mille character:
> >>>
> >>> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/48709.html
> >>>
> >>> It should look like a "%" character but with two circles at the bottom.
> >>
> >> Perhaps I do not have appropriate font for this character.
> >>
> >> Gregor
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> >
> >
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson                 [t] +44 (0)20 7679 0522
 ECRC & ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%