[Rd] Inconsistency in gsub in R.2.6.2 (PR#10978)

christian.buchta at wu-wien.ac.at christian.buchta at wu-wien.ac.at
Mon Mar 17 21:55:12 CET 2008


This is a multi-part message in MIME format.
--------------040104050805010601010607
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit


Hi,

May this be an oversight?

R version 2.6.2 Patched (2008-03-13 r44783)
Copyright (C) 2008 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

...

 > x <- "abä"
 > Encoding(x)
[1] "latin1"
 > Encoding(gsub("ä","", x))
[1] "unknown"
 > Encoding(gsub("ä","", x, perl = TRUE))
[1] "latin1"

The code in src/main/pcre.c (see also do_tolower and do_strsplit in 
src/main/character.c) suggests to patch as attached.

 > x <- "abä"
 > Encoding(gsub("ä","", x))
[1] "latin1"


Happy Easter

Christian

-- 
Christian Buchta -> Institute for Tourism and Leisure Studies ->
Vienna University of Economics and Business Administration -> Vienna
-> Austria -> Europe. Visit us on http://www.wu-wien.ac.at/itf/.


--------------040104050805010601010607
Content-Type: text/plain;
 name="patch_44783"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="patch_44783"

Index: src/main/character.c
===================================================================
--- src/main/character.c	(revision 44783)
+++ src/main/character.c	(working copy)
@@ -1281,7 +1281,7 @@
 		    strcat(u, t);
 		} while(global && (st = fgrep_one_bytes(spat, s, useBytes)) >= 0);
 		strcat(u, s);
-                SET_STRING_ELT(ans, i, mkChar(cbuf));
+                SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i)));
                 Free(cbuf);
 	    }
 	} else {
@@ -1337,7 +1337,7 @@
 		    for (j = offset ; s[j] ; j++)
 			*u++ = s[j];
 		*u = '\0';
-                SET_STRING_ELT(ans, i, mkChar(cbuf));
+                SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i)));
                 Free(cbuf);
 	    }
 	}

--------------040104050805010601010607--



More information about the R-devel mailing list