[Rd] Inconsistency in gsub in R.2.6.2 (PR#10978)
christian.buchta at wu-wien.ac.at
christian.buchta at wu-wien.ac.at
Mon Mar 17 21:55:12 CET 2008
This is a multi-part message in MIME format.
--------------040104050805010601010607
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Hi,
May this be an oversight?
R version 2.6.2 Patched (2008-03-13 r44783)
Copyright (C) 2008 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
...
> x <- "abä"
> Encoding(x)
[1] "latin1"
> Encoding(gsub("ä","", x))
[1] "unknown"
> Encoding(gsub("ä","", x, perl = TRUE))
[1] "latin1"
The code in src/main/pcre.c (see also do_tolower and do_strsplit in
src/main/character.c) suggests to patch as attached.
> x <- "abä"
> Encoding(gsub("ä","", x))
[1] "latin1"
Happy Easter
Christian
--
Christian Buchta -> Institute for Tourism and Leisure Studies ->
Vienna University of Economics and Business Administration -> Vienna
-> Austria -> Europe. Visit us on http://www.wu-wien.ac.at/itf/.
--------------040104050805010601010607
Content-Type: text/plain;
name="patch_44783"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="patch_44783"
Index: src/main/character.c
===================================================================
--- src/main/character.c (revision 44783)
+++ src/main/character.c (working copy)
@@ -1281,7 +1281,7 @@
strcat(u, t);
} while(global && (st = fgrep_one_bytes(spat, s, useBytes)) >= 0);
strcat(u, s);
- SET_STRING_ELT(ans, i, mkChar(cbuf));
+ SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i)));
Free(cbuf);
}
} else {
@@ -1337,7 +1337,7 @@
for (j = offset ; s[j] ; j++)
*u++ = s[j];
*u = '\0';
- SET_STRING_ELT(ans, i, mkChar(cbuf));
+ SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i)));
Free(cbuf);
}
}
--------------040104050805010601010607--
More information about the R-devel
mailing list