[Rd] problem gsub in the locale of CP932 and SJIS (PR#9751)
nakama at ki.rim.or.jp
nakama at ki.rim.or.jp
Mon Jun 25 12:08:31 CEST 2007
Thanks.
As for mbs_init, the outside of the loop is desirable.
probrem code is.
> gsub("A","=A5u30bd=A5u8868","A")
euc-jp and utf-8 moves without a problem.
> Sys.getlocale("LC_CTYPE") # SHIFT_JIS system.
[1] "ja_JP.SJIS"
> charToRaw("=A5u30bd=A5u8868") # The second byte is a char of 5c
[1] 83 5c 95 5c
2007/6/25, Prof Brian Ripley <ripley at stats.ox.ac.uk>:
> Thanks for this.
>
> I don't think the patch is quite right. As I understand it, mbstate_t
> should be initialized at the start of the string, not before each
> character, and that is what is done in the rest of R.
>
> Also, do you have an example I can use to test the patch, please?
>
> R 2.5.0 is now in code freeze and I don't think this is vital for that.
>
>
> On Sun, 24 Jun 2007, nakama at ki.rim.or.jp wrote:
>
> > Full_Name: Ei-ji Nakama
> > Version: R-2.5.0
> > OS: any
> > Submission from: (NULL) (219.117.236.5)
> >
> >
> > problem by operation of gsub in the locale of CP932 and SJIS.
> > The inconvenient character code which used 0x5c after the first byte.
> >
> > --- R-2.5.0.orig/src/main/character.c 2007-04-03 11:05:05.000000000 +=
0900
> > +++ R-2.5.0/src/main/character.c 2007-06-24 22:31:06.000000000 +=
0900
> > @@ -986,6 +986,17 @@
> > char *p =3D repl;
> > n =3D strlen(repl) - (regmatch[0].rm_eo - regmatch[0].rm_so);
> > while (*p) {
> > +#ifdef SUPPORT_MBCS
> > + if(mbcslocale){
> > + int clen;
> > + mbstate_t mb_st;
> > + mbs_init(&mb_st);
> > + if((clen =3D Mbrtowc(NULL, p, MB_CUR_MAX, &mb_st)) > 1){
> > + p+=3Dclen;
> > + continue;
> > + }
> > + }
> > +#endif
> > if (*p =3D=3D '\\') {
> > if ('1' <=3D p[1] && p[1] <=3D '9') {
> > k =3D p[1] - '0';
> > @@ -1014,6 +1025,18 @@
> > int i, k;
> > char *p =3D repl, *t =3D target;
> > while (*p) {
> > +#ifdef SUPPORT_MBCS
> > + if(mbcslocale){
> > + int clen;
> > + mbstate_t mb_st;
> > + mbs_init(&mb_st);
> > + if((clen =3D Mbrtowc(NULL, p, MB_CUR_MAX, &mb_st)) > 1){
> > + for ( i=3D0; i<clen; i++)
> > + *t++ =3D *p++;
> > + continue;
> > + }
> > + }
> > +#endif
> > if (*p =3D=3D '\\') {
> > if ('1' <=3D p[1] && p[1] <=3D '9') {
> > k =3D p[1] - '0';
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
>
>
--=20
EI-JI Nakama <nakama at ki.rim.or.jp>
"\u4e2d\u9593\u6804\u6cbb" <nakama at ki.rim.or.jp>
More information about the R-devel
mailing list