[R] gsub() with unicode and escape character
Sverre Stausland
johnsen at fas.harvard.edu
Sun Jul 17 04:19:51 CEST 2011
Dear helpers,
I'm trying to replace a character with a unicode code inside a data
frame using gsub(), but unsuccessfully.
> data.frame(animals=c("dog","wolf","cat"))->my.data
> gsub("o","\u0254",my.data$animals)->my.data$animals
> my.data$animals
[1] "dɔg" "wɔlf" "cat"
It's not that a data frame cannot have unicode codes, cf. e.g.
> data.frame(animals=c("d\u0254g","w\u0254lf","cat"))->my.data.2
> my.data.2$animals
[1] dɔg wɔlf cat
Levels: cat d<U+0254>g w<U+0254>lf
I've done the best I can based on what ?gsub and ?enc2utf8 tell me,
but I haven't found a solution.
Unrelated to that problem, but related to gsub() is that I can't find
a way for gsub() to interpret the backslash as a character. In regular
expression, \\ should represent "the character \", but gsub() doesn't:
> data.frame(animals=c("dog","wolf","cat"))->my.data
> gsub("d","\\",my.data$animals)
[1] "og" "wolf" "cat"
Thank you
Sverre
More information about the R-help
mailing list