[Rd] problem gsub in the locale of CP932 and SJIS (PR#9751)

nakama at ki.rim.or.jp nakama at ki.rim.or.jp
Mon Jun 25 12:08:31 CEST 2007


Thanks.

As for mbs_init, the outside of the loop is desirable.

probrem code is.
> gsub("A","=A5u30bd=A5u8868","A")

euc-jp and utf-8 moves without a problem.

> Sys.getlocale("LC_CTYPE")    # SHIFT_JIS system.
[1] "ja_JP.SJIS"
> charToRaw("=A5u30bd=A5u8868")   # The second byte is a char of 5c
[1] 83 5c 95 5c

2007/6/25, Prof Brian Ripley <ripley at stats.ox.ac.uk>:
> Thanks for this.
>
> I don't think the patch is quite right.  As I understand it, mbstate_t
> should be initialized at the start of the string, not before each
> character, and that is what is done in the rest of R.
>
> Also, do you have an example I can use to test the patch, please?
>
> R 2.5.0 is now in code freeze and I don't think this is vital for that.
>
>
> On Sun, 24 Jun 2007, nakama at ki.rim.or.jp wrote:
>
> > Full_Name: Ei-ji Nakama
> > Version: R-2.5.0
> > OS: any
> > Submission from: (NULL) (219.117.236.5)
> >
> >
> > problem by operation of gsub in the locale of CP932 and SJIS.
> > The inconvenient character code which used 0x5c after the first byte.
> >
> > --- R-2.5.0.orig/src/main/character.c   2007-04-03 11:05:05.000000000 +=
0900
> > +++ R-2.5.0/src/main/character.c        2007-06-24 22:31:06.000000000 +=
0900
> > @@ -986,6 +986,17 @@
> >     char *p =3D repl;
> >     n =3D strlen(repl) - (regmatch[0].rm_eo - regmatch[0].rm_so);
> >     while (*p) {
> > +#ifdef  SUPPORT_MBCS
> > +       if(mbcslocale){
> > +           int clen;
> > +           mbstate_t mb_st;
> > +           mbs_init(&mb_st);
> > +           if((clen =3D Mbrtowc(NULL, p, MB_CUR_MAX, &mb_st)) > 1){
> > +               p+=3Dclen;
> > +               continue;
> > +           }
> > +       }
> > +#endif
> >        if (*p =3D=3D '\\') {
> >            if ('1' <=3D p[1] && p[1] <=3D '9') {
> >                k =3D p[1] - '0';
> > @@ -1014,6 +1025,18 @@
> >     int i, k;
> >     char *p =3D repl, *t =3D target;
> >     while (*p) {
> > +#ifdef  SUPPORT_MBCS
> > +       if(mbcslocale){
> > +           int clen;
> > +           mbstate_t mb_st;
> > +           mbs_init(&mb_st);
> > +           if((clen =3D Mbrtowc(NULL, p, MB_CUR_MAX, &mb_st)) > 1){
> > +               for ( i=3D0; i<clen; i++)
> > +                   *t++ =3D *p++;
> > +               continue;
> > +           }
> > +       }
> > +#endif
> >        if (*p =3D=3D '\\') {
> >            if ('1' <=3D p[1] && p[1] <=3D '9') {
> >                k =3D p[1] - '0';
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
>
>


--=20
EI-JI Nakama  <nakama at ki.rim.or.jp>
"\u4e2d\u9593\u6804\u6cbb"  <nakama at ki.rim.or.jp>



More information about the R-devel mailing list