[Rd] Windows, format.POSIXct and character encodings
Hadley Wickham
h.wickham at gmail.com
Wed May 1 16:06:53 CEST 2013
Hi all,
In what encoding does format.POSIXct return its output? It doesn't
seem to be utf-8:
Sys.setlocale("LC_ALL", "Japanese_Japan.932")
times <- c("1970-01-01 01:00:00 UTC", "1970-02-02 22:00:00 UTC")
ampm <- format(as.POSIXct(times), format = "%p")
x <- gsub(">", "*", paste(ampm, collapse = "+>"))
y <- "午前+*午後"
identical(x, y)
# [1] TRUE
# But, confusingly, ...
charToRaw(x)
# [1] e5 8d 88 e5 89 8d 2b 2a e5 8d 88 e5 be 8c
charToRaw(y)
# [1] 8c df 91 4f 2b 2a 8c df 8c e3
# So there's at least a small bug with identical
# And this causes a problem when you attempt to do
# stuff with the string
gsub("+", "*", x, fixed = T)
# Error in gsub("+", "*", x, fixed = T) :
# invalid multibyte string at '<8c>'
gsub("+", "*", y, fixed = T)
# [1] "午前**午後"
My session info is
R version 3.0.0 (2013-04-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Japanese_Japan.932 LC_CTYPE=Japanese_Japan.932
[3] LC_MONETARY=Japanese_Japan.932 LC_NUMERIC=C
[5] LC_TIME=Japanese_Japan.932
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.0.0
Any ideas? Thanks!
Hadley
--
Chief Scientist, RStudio
http://had.co.nz/
More information about the R-devel
mailing list