[Rd] Inconsistency in gsub in R.2.6.2 (PR#10978)

ripley at stats.ox.ac.uk ripley at stats.ox.ac.uk
Tue Mar 18 07:50:12 CET 2008


  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--27464147-1221975610-1205822844=:9482
Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed
Content-Transfer-Encoding: 8BIT

This has already been corrected in R-devel.  It was wrong to set the 
encoding to that of the element of 'x': gsub will have changed it (to 
native or UTF-8).

On Mon, 17 Mar 2008, christian.buchta at wu-wien.ac.at wrote:

> This is a multi-part message in MIME format.
> --------------040104050805010601010607
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 8bit
>
>
> Hi,
>
> May this be an oversight?
>
> R version 2.6.2 Patched (2008-03-13 r44783)
> Copyright (C) 2008 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
>
> ...
>
> > x <- "abä"
> > Encoding(x)
> [1] "latin1"
> > Encoding(gsub("ä","", x))
> [1] "unknown"
> > Encoding(gsub("ä","", x, perl = TRUE))
> [1] "latin1"
>
> The code in src/main/pcre.c (see also do_tolower and do_strsplit in
> src/main/character.c) suggests to patch as attached.
>
> > x <- "abä"
> > Encoding(gsub("ä","", x))
> [1] "latin1"
>
>
> Happy Easter
>
> Christian
>
> --
> Christian Buchta -> Institute for Tourism and Leisure Studies ->
> Vienna University of Economics and Business Administration -> Vienna
> -> Austria -> Europe. Visit us on http://www.wu-wien.ac.at/itf/.
>
>
> --------------040104050805010601010607
> Content-Type: text/plain;
> name="patch_44783"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline;
> filename="patch_44783"
>
> Index: src/main/character.c
> ===================================================================
> --- src/main/character.c	(revision 44783)
> +++ src/main/character.c	(working copy)
> @@ -1281,7 +1281,7 @@
> 		    strcat(u, t);
> 		} while(global && (st = fgrep_one_bytes(spat, s, useBytes)) >= 0);
> 		strcat(u, s);
> -                SET_STRING_ELT(ans, i, mkChar(cbuf));
> +                SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i)));
>                 Free(cbuf);
> 	    }
> 	} else {
> @@ -1337,7 +1337,7 @@
> 		    for (j = offset ; s[j] ; j++)
> 			*u++ = s[j];
> 		*u = '\0';
> -                SET_STRING_ELT(ans, i, mkChar(cbuf));
> +                SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i)));
>                 Free(cbuf);
> 	    }
> 	}
>
> --------------040104050805010601010607--
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
--27464147-1221975610-1205822844=:9482--



More information about the R-devel mailing list