[Rd] Inconsistency in gsub in R.2.6.2 (PR#10978)
ripley at stats.ox.ac.uk
ripley at stats.ox.ac.uk
Tue Mar 18 07:50:12 CET 2008
This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
--27464147-1221975610-1205822844=:9482
Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed
Content-Transfer-Encoding: 8BIT
This has already been corrected in R-devel. It was wrong to set the
encoding to that of the element of 'x': gsub will have changed it (to
native or UTF-8).
On Mon, 17 Mar 2008, christian.buchta at wu-wien.ac.at wrote:
> This is a multi-part message in MIME format.
> --------------040104050805010601010607
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 8bit
>
>
> Hi,
>
> May this be an oversight?
>
> R version 2.6.2 Patched (2008-03-13 r44783)
> Copyright (C) 2008 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
>
> ...
>
> > x <- "abä"
> > Encoding(x)
> [1] "latin1"
> > Encoding(gsub("ä","", x))
> [1] "unknown"
> > Encoding(gsub("ä","", x, perl = TRUE))
> [1] "latin1"
>
> The code in src/main/pcre.c (see also do_tolower and do_strsplit in
> src/main/character.c) suggests to patch as attached.
>
> > x <- "abä"
> > Encoding(gsub("ä","", x))
> [1] "latin1"
>
>
> Happy Easter
>
> Christian
>
> --
> Christian Buchta -> Institute for Tourism and Leisure Studies ->
> Vienna University of Economics and Business Administration -> Vienna
> -> Austria -> Europe. Visit us on http://www.wu-wien.ac.at/itf/.
>
>
> --------------040104050805010601010607
> Content-Type: text/plain;
> name="patch_44783"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline;
> filename="patch_44783"
>
> Index: src/main/character.c
> ===================================================================
> --- src/main/character.c (revision 44783)
> +++ src/main/character.c (working copy)
> @@ -1281,7 +1281,7 @@
> strcat(u, t);
> } while(global && (st = fgrep_one_bytes(spat, s, useBytes)) >= 0);
> strcat(u, s);
> - SET_STRING_ELT(ans, i, mkChar(cbuf));
> + SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i)));
> Free(cbuf);
> }
> } else {
> @@ -1337,7 +1337,7 @@
> for (j = offset ; s[j] ; j++)
> *u++ = s[j];
> *u = '\0';
> - SET_STRING_ELT(ans, i, mkChar(cbuf));
> + SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i)));
> Free(cbuf);
> }
> }
>
> --------------040104050805010601010607--
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
--27464147-1221975610-1205822844=:9482--
More information about the R-devel
mailing list